Budd, S. (email@example.com)
Fri, 30 Apr 1999 17:09:43 +0100
This request is getting very close to another request:
to be able to find all pages which reference a particular url
i.e.. the URL in a href should be a index term of that page
and the search would be something like URL: http//fred/mary.html
returning all pages which reference fred/mary.html
Very good for web administrators.
> -----Original Message-----
> From: Geoff Hutchison [SMTP:firstname.lastname@example.org]
> Sent: 30 April 1999 14:11
> To: email@example.com
> Cc: firstname.lastname@example.org; email@example.com
> Subject: Re: [htdig3-dev] Re: [htdig] New feature proposal
> At 7:33 AM -0400 4/30/99, Torsten Neuer wrote:
> >it. This feature would not conform to any "standard" for search
> >engines (if there is any) and thus could cause trouble to web-
> Actually several major search engines, including AltaVista, seem to do
> exactly this already. The feature has been requested a few times, though
> never quite as specifically.
> >If acceptable at all, I would further suggest the following confi-
> >guration directives for this feature:
> >url_path_as_keywords: [true|false] # self-explaining
> >url_path_increment_factor: n # where n is of N
> You don't need url_path_as_keywords since setting the factor to 0 will
> effectively disable it.
> >Should be more or less easy to implement, no? >:-]
> If we're happy to limit it to only indexing "words" based on the slashes
> the path, it's not very hard. The URL class in ht://Dig already allows you
> to grab only the path, so then you split it based on '/' and add the words
> using the Retriever class.
> I always wonder if we should worry about URLs like:
> http://wso.williams.edu/cafewso/ -> cafe ?
> http://www.foo.com/foo/bar/ -> foobar ?
> -Geoff Hutchison
> Williams Students Online
> To unsubscribe from the htdig mailing list, send a message to
> firstname.lastname@example.org containing the single word "unsubscribe" in
> the SUBJECT of the message.
To unsubscribe from the htdig3-dev mailing list, send a message to
email@example.com containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Apr 30 1999 - 09:19:25 PDT