Re: [htdig] Following links, not indexing a doc

Subject: Re: [htdig] Following links, not indexing a doc
From: Gilles Detillieux (
Date: Tue Nov 07 2000 - 11:23:15 PST

According to Eric Bliss:
> Htdig has been acting well for us for some time now, but there is one glitch that has been brought to my attention.
> We have a number of websites which are updated on a regular basis. Because of this, old pages are being unlinked every week from
> the main body of the site. To keep these pages in the search engine database (as opposed to being lost forever), I've created a
> page for each website that just consists of the URLs of each of these pages. At the top of these pages, I place the meta tags to
> tell htdig to follow the links, but not index the page <META NAME="ROBOTS" CONTENT="NOINDEX">. I use these pages as the base
> documents for htdig to crawl from.
> My problem is that although htdig's website says that it follows the robot rules, my index documents still show up when a search is
> done. Is there a different tag I should be using, or do you need to specify a setting in htdig for it to obey robot rules?

There's a subtle bug in 3.1.5 and earlier versions. The content parameter
of the meta robots tag should be case-insensitive, but htdig was expecting
lower-case. You can either change the tag, or apply this patch to fix the

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Tue Nov 07 2000 - 11:30:06 PST