Budd, S. (s.budd@ic.ac.uk)
Fri, 30 Apr 1999 17:09:43 +0100
This request is getting very close to another request:
to be able to find all pages which reference a particular url
i.e.. the URL in a href should be a index term of that page
and the search would be something like URL: http//fred/mary.html
returning all pages which reference fred/mary.html
Very good for web administrators.
> -----Original Message-----
> From: Geoff Hutchison [SMTP:ghutchis@wso.williams.edu]
> Sent: 30 April 1999 14:11
> To: htdig@htdig.org
> Cc: htdig3-dev@htdig.org; htdig@htdig.org
> Subject: Re: [htdig3-dev] Re: [htdig] New feature proposal
>
>
> At 7:33 AM -0400 4/30/99, Torsten Neuer wrote:
>
> >it. This feature would not conform to any "standard" for search
> >engines (if there is any) and thus could cause trouble to web-
>
> Actually several major search engines, including AltaVista, seem to do
> exactly this already. The feature has been requested a few times, though
> never quite as specifically.
>
> >If acceptable at all, I would further suggest the following confi-
> >guration directives for this feature:
> >
> >url_path_as_keywords: [true|false] # self-explaining
> >url_path_increment_factor: n # where n is of N
>
> You don't need url_path_as_keywords since setting the factor to 0 will
> effectively disable it.
>
> >Should be more or less easy to implement, no? >:-]
>
> If we're happy to limit it to only indexing "words" based on the slashes
> in
> the path, it's not very hard. The URL class in ht://Dig already allows you
> to grab only the path, so then you split it based on '/' and add the words
> using the Retriever class.
>
> I always wonder if we should worry about URLs like:
>
> http://wso.williams.edu/cafewso/ -> cafe ?
> http://www.foo.com/foo/bar/ -> foobar ?
>
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig@htdig.org containing the single word "unsubscribe" in
> the SUBJECT of the message.
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Apr 30 1999 - 09:19:25 PDT