Subject: Re: [htdig] Following links, not indexing a doc
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Tue Nov 07 2000 - 11:23:15 PST
According to Eric Bliss:
> Htdig has been acting well for us for some time now, but there is one glitch that has been brought to my attention.
> We have a number of websites which are updated on a regular basis. Because of this, old pages are being unlinked every week from
> the main body of the site. To keep these pages in the search engine database (as opposed to being lost forever), I've created a
> page for each website that just consists of the URLs of each of these pages. At the top of these pages, I place the meta tags to
> tell htdig to follow the links, but not index the page <META NAME="ROBOTS" CONTENT="NOINDEX">. I use these pages as the base
> documents for htdig to crawl from.
> My problem is that although htdig's website says that it follows the robot rules, my index documents still show up when a search is
> done. Is there a different tag I should be using, or do you need to specify a setting in htdig for it to obey robot rules?
There's a subtle bug in 3.1.5 and earlier versions. The content parameter
of the meta robots tag should be case-insensitive, but htdig was expecting
lower-case. You can either change the tag, or apply this patch to fix the
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Tue Nov 07 2000 - 11:30:06 PST