htdig: robots.txt and case-sensitivity

Tobias Brasier (
Fri, 06 Feb 1998 09:27:56 -0500

htdig, via HTTP (a case-sensitive protocol, regardless of the OS the web
server is on), follows links to web sites it wants to index. It reads first
the robots.txt file located at the root level of the primary document

For example:
htdig is indexing servers within the domain, and begins at URL, and follows a link to the target page
at In this example, the latter
server is a WindowsNT machine (a case-insensitive OS) running Netscape
Enterprise Server 3x.

The robot exclusion file at contains the
following line:
     Disallow: /FRED/

The target page in question can be accessed by, by, by, or any other case-variation.
However, htdig is looking to NOT index (the link it is following), but CAN
STILL index,, or any other case-variation.

My questions:
- Is there a configuration in htdig that I am missing that would handle
this problem?
- Can a web server be configured to present URLs in a certain case? I
haven't found it in the web server software in question.
- How can the webmaster of disallow indexing of any
directories for certain? List every case variation of a directory name in
- Does there need to be a link in one of those other case variations for
htdig to follow that causes a target page to be indexed?
- Does any of the preceding make sense? Have I made any incorrect assumptions?

Thank you all very much.

  Tobias A. Brasier
  Webmaster - The University of South Carolina
  Internet Solutions Group - Division of Libraries & Information Systems
  1244 Blossom Street, Columbia, South Carolina 29208
  voice: (803) 777-5211 | fax: (803) 777-4149 |

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:41 PST