Re: htdig: Duplicate files with unique URLs


Kirk Petersen (kirk@wrc.noaa.gov)
Wed, 10 Dec 1997 12:34:10 -0800 (PST)


> The trouble is that at some where in all the wwwpages there is a URL that
> sends HTDIG back to the same site with a slight varation in the ULR that makes it
> unique. So the same site gets indexed twice.

        I have exactly the same problem. My solution was to add "///" to
the exclude_urls field of the conf file. You have to accept some
duplicates (http://foo/bar/ and http://foo/bar//) but atleast htdig
doesn't run forever.
        I tried to modify the code to remove unnecessary "/"s but it
never worked for me. Does anyone out there know how this would be done?

Thanks,
Kirk Petersen
kirk@nwn.noaa.gov
NOAA Network Operations Center
(206) 526-4511

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:24 PST