Re: [htdig] New feature proposal

Torsten Neuer (
Fri, 30 Apr 1999 13:33:40 +0200

According to heddy Boubaker:
> Could it be possible to add new keyword from the url path.
> for ex:
> http://server/the/path/to/the/document.html
> will add keyword: the, path, to, the and document to keywords extracted from
> document.html
> This could be configurable and the factor of kwds could increase according to
> the `deepness' of the path.
>url_path_start_factor: n
> then:
>the = n
>path = n+1
>to = n+2
>the = n+3
>document = n+4
> What do you think of this idea ?
> I personally would be interested in such a feature
> - heddy -

Well, when used as a search tool in general, I'd think little of
it. This feature would not conform to any "standard" for search
engines (if there is any) and thus could cause trouble to web-
masters thinking the path would also add to the keywords of other
www robots.

However, when used in intraweb space it could add to the accuracy
of the search engine installed, given that the URL of the document
in question is made up in a way that makes sense ,-)

So it might be useful in some cases where the URL tree resembles
the structure of a website (which is true for newly created sites
and for restructured sites but might be terribly wrong for those
which have grown over the past few years).

If acceptable at all, I would further suggest the following confi-
guration directives for this feature:

url_path_as_keywords: [true|false] # self-explaining
url_path_increment_factor: n # where n is of N

The feature should be enabled/disabled by the "url_path_as_keywords"
directive and controlled by the directives "url_path_start_factor"
and "url_path_increment_factor".

The default for this feature should be "url_path_as_keywords: false",
"url_path_increment_factor: 1" and "url_path_start_factor: n" where
'n' is a value of N which would not mess up the search result com-
pletely (i.e. is kind of "friendly" to other factors.. matter of
trial and error maybe).

Should be more or less easy to implement, no? >:-]



InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail:            Internet:
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Apr 30 1999 - 04:58:34 PDT