Subject: Re: [htdig] Ignore robots.txt and META-Tags.
From: Gilles Detillieux (
Date: Tue Jan 25 2000 - 15:59:20 PST

According to Sven Hartge:
> We want to index our local server using htdig. There is only one slight
> problem: We have several META-tags in our files, to prevent external
> search-enginges to crawl through the whole server. Now, htdig honors
> these tags too and indexes only some pages. I used an old release and
> manually patched the check for no-index out of the source, but this is
> way to much work to do, if I need to upgrade the htdig version (and is
> definately not a right thing [tm]). I've read the documentation which
> comes with htdig and also searched through the website, but ... I am
> _sure_ I am missing something here.

No, the only way to get htdig to ignore META noindex tags is to patch
the source. If you want to allow some search engines but disallow
others, the place to do this is in robots.txt, which allows different
rules for different user agents. META tags are meant to apply to any
web client smart enough to use them.

> Oh, and a next one: Is it possible to search for words containing umlauts
> () if there are _no_ locales installed? I do not have root-access so
> I won't be able to install them in the right places. Will htdig work, if
> they are installed somewhere in the /home-directory of the user?

No, unfortunately, that's one of the big problems with locales -
if they're not set up correctly on your system, you're out of luck,
unless you can coax your sysadmin to install them for you, and provided
the system's C library support for locales isn't broken. I've proposed
a mechanism for overriding bad/broken/missing locales, but no one has
implemented it yet.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

