Re: [htdig] Duplicate pages

Subject: Re: [htdig] Duplicate pages
From: Gilles Detillieux (
Date: Wed Sep 20 2000 - 08:52:13 PDT

According to
> The site I am indexing is a bit peculiar. The following
> is an example of the setup, where each page is exactly
> the same.
> I assumed that in the case where there is no index.html
> that it was just loading the index.html. Here's the
> problem. htdig recognizes this as 4 different pages,
> and indexes all of them. I can see where it would think
> it is 2 different because of the s and S. Is there any
> way to prevent the duplicates?

The remove_default_doc attribute should take care of the superfluous
"index.html" entries, but I'm not so sure about the extra Subdirectory
names. You can't use exclude_urls for this, because it does a case
insensitive match.

On my site, I make use of a few symbolic links for subdirectories, to
give an all-lowercase equivalent to some mixed case names, but I never
use these in URLs on my site, for this very reason. I only use them to
support links from other sites, where other admins may be a tad sloppy
about getting the case right. I realise this isn't a workable alternative
for you if you don't maintain control over the whole site you're indexing.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Wed Sep 20 2000 - 08:55:04 PDT