Dear ht://Dig Group,

You might want to add this or include something like this in the explanation of robots and indexing in the ht://Dig docs.

Thanks for the greatest software package on Earth.


-Gabriel Fenteany (



On Robots and Indexing:

ht://Dig supports the robot exclusion protocol, robots metatags and ht://Dig-specific robot properties.

1. Robots Exclusion Protocol The robot looks for a document named robots.txt in your site's entry directory. So for the site, the robots.txt file would have the URL http:/ If it can find this document, it will analyse its contents for records like: User-agent: * Disallow: /

For instance, to exclude only the ht://Dig indexer from directories with the names cgi-bin, tmp or private, you'd put the following text in the robots.txt file:


User-agent: htdig
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

For more details see: Robots Exclusion Protocol.


2. General Robots Metatags

To allow robots such as htdig to index the current page but not follow local links, you can use :

<meta name="robots" content="nofollow">

as in:

<meta name="robots" content="nofollow">
<meta name="description" content="...">

You can also specify that the page not be indexed (in the following case, the page containing the following code between the <head> and </head> tags will not be indexed but local links will be followed):

<meta name="robots" content="noindex"> 

To prevent a page both from being indexed and from local links being followed, you can similarly use:

<meta name="robots" content="noinddex,nofollow">


3. ht://Dig-Specific Robots Metatag

To prevent a page from being indexed just by htdig but not other robots that follow the robots metatag convention, use:

<meta name="htdig-noindex">


4. Tags to Prevent Indexing Only Part of a Document

Enclose all the stuff you don't want indexed in a document with:


(where "..." is everything you don't want indexed)