htdig: Re: Excluding directories and duplicate URLs patch


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Sun, 13 Sep 1998 09:02:33 -0400


>have not been applied to ht://Dig 3.1.0b1; I applied it manually and
>recompiled htdig and reran rundig. My databases shrank to their normal
>size; no more duplicates;-) Please include this patch in your next
>release.

The reason I did not apply this patch to the 3.1.0b1 release is because it
only applies to local indexing. So I didn't want to announce "elimination
of duplicate files" until I had a patch ready for HTTP access as well
(which I don't).

I also don't think direct elimination is the correct approach. I'd rather
*detect* duplicates and store multiple URLs for each page. This is the
approach used by other search engines for mirrors, so I think this should
be the approach for ht://Dig too.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:48 PST