Re: [htdig] Duplicates


Subject: Re: [htdig] Duplicates
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Dec 27 2000 - 19:21:10 PST


On Sat, 23 Dec 2000, Ing. Noel Vargas Baltodano wrote:

> I've succesfully ran Htdig, and it scanned every file I wanted to. The
> only thing now is that I get several duplicates.
>
> Is there a way to tell Htdig to display 'unique' URLs only?

It *does* only display unique URLs. If you see two URLs that are exactly
(i.e. character for character) the same in htsearch, there's a bug.

On the other hand, it's very easy to have multiple URLs point to the same
document. This is the most common problem of "duplicates." If you are
willing to try a beta, grab the latest snapshot of the 3.2.0b3 code and
look at the RELEASE.html file in it. There is now code to compute an md5
checksum to eliminate this problem.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Dec 27 2000 - 19:32:43 PST