Re: htdig: Duplicate files with unique URLs

Kirk Petersen (
Wed, 10 Dec 1997 12:34:10 -0800 (PST)

> The trouble is that at some where in all the wwwpages there is a URL that
> sends HTDIG back to the same site with a slight varation in the ULR that makes it
> unique. So the same site gets indexed twice.

        I have exactly the same problem. My solution was to add "///" to
the exclude_urls field of the conf file. You have to accept some
duplicates (http://foo/bar/ and http://foo/bar//) but atleast htdig
doesn't run forever.
        I tried to modify the code to remove unnecessary "/"s but it
never worked for me. Does anyone out there know how this would be done?

Kirk Petersen
NOAA Network Operations Center
(206) 526-4511

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:24 PST