[htdig] Re: htdig URL duplication


Hans-Peter Nilsson (hans-peter.nilsson@axis.com)
Tue, 26 Jan 1999 21:41:58 +0100


> From: D.P.Birchall@sussex.ac.uk (Danny Birchall)
> Date: Tue Jan 26 16:59:43 CET 1999

> Does htdig create a list of URLs before it creates its databases? Could this
> list be doctored to remove duplicates before the database is built?

I may be missing something, but couldn't you just exclude the
unwanted URLs, with e.g.:

 exclude_urls: /cgi-bin/ .cgi http://www.sussex.ac.uk/Units/foo/

(don't forget to include the defaults, cgi-bin and such.)

Assuming all appropriate documents are also indexed under
"www.sussex.ac.uk/foo/", that is.

I don't think this is really a job for url_part_aliases (new
feature) although it can be bent to fix this.

brgds, H-P
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sun Jan 31 1999 - 10:43:20 PST