htdig: Duplicate files with unique URLs

Ray Krebs (
Wed, 10 Dec 1997 13:51:48 -0500 (EST)

Hello Everyone,

I'm running HTDIG and indexing 8 different sites.

The trouble is that at some where in all the wwwpages there is a URL that
sends HTDIG back to the same site with a slight varation in the ULR that makes it
unique. So the same site gets indexed twice.

Example URLs:

As you can see the only difference is the extra "/" slash. HTDIG hits this and
see's it as being unique and ends up starting a whole new search doubling the
content of one of the 8 sites. Searches to this are also returning
duplicates as well.

My config file setup:

start_url: ...

limit_urls_to: ${start_url}

Is there something that I can use to 'filter' out all of these extras?

I have tried using the exclude_urls with but this doesn't
seem to work.

Are there other ways to exclude this?

Thanks for any assistance,

V. Ray Krebs III InfiNet Publishing Systems Administrator Phone: 757-624-2295 x3310 Fax: 757-627-2498

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:24 PST