Re: [htdig3-dev] Re: [htdig] New feature proposal


Geoff Hutchison (ghutchis@wso.williams.edu)
Fri, 30 Apr 1999 09:11:29 -0400


At 7:33 AM -0400 4/30/99, Torsten Neuer wrote:

>it. This feature would not conform to any "standard" for search
>engines (if there is any) and thus could cause trouble to web-

Actually several major search engines, including AltaVista, seem to do
exactly this already. The feature has been requested a few times, though
never quite as specifically.

>If acceptable at all, I would further suggest the following confi-
>guration directives for this feature:
>
>url_path_as_keywords: [true|false] # self-explaining
>url_path_increment_factor: n # where n is of N

You don't need url_path_as_keywords since setting the factor to 0 will
effectively disable it.

>Should be more or less easy to implement, no? >:-]

If we're happy to limit it to only indexing "words" based on the slashes in
the path, it's not very hard. The URL class in ht://Dig already allows you
to grab only the path, so then you split it based on '/' and add the words
using the Retriever class.

I always wonder if we should worry about URLs like:

http://wso.williams.edu/cafewso/ -> cafe ?
http://www.foo.com/foo/bar/ -> foobar ?

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Apr 30 1999 - 06:31:25 PDT