Re: [htdig] htdig index too small

Joerg Behrens (
Tue, 28 Sep 1999 18:11:52 +0200

Andy Malato schrieb:
> Hello,
> I've installed htdig 3.1.3 on my BSDI 3.1 system. I ran htdig and it doesn't
> seem to index my entire site. I am unsure of why this is. I've played
> around with all settings in the config file, especially max_head_length and
> max_head_size, i have these values set to 150000 and 300000 respectively. I
> however cannot get it to index more than three documents of my site.
> Does anyone have any ideas?
> ---Andy

Hi Andy,

1. Htdig 3.1.2 only support normal links like "<a
href="test.html">link</a>" or <a href="test.html"><img
src="test.jpg"></a>. (the version 3.1.3 comes out with support for
<EMBED>, <OBJECT>, and <LINK> HTML tags.) So htdig dont follow links
that are created/embeded in javascript or "pulldown menues" or something
like that!

2. You are sure that all of your pages are linked??? REALLY??

3. Htdig consider the folling metatags
<meta name="robots" content="noindex"> and <meta name="robots"
content="nofollow">. Be sure that you are not setting this tags in your

4. Htdig consider the robots.txt file too. When you have this file in
your document-root take a look on it... you can disable the access for
crawlers and diggers like "htdig" for each directory

5. With a good webserver like "apache" is it possible to disable the
access for some Clients (special browser and crawler) too.

6. Are your using .htaccess or .acl files... to give access for special
users? so by default htdig canot access to this directories and the
pages that containt there.

7. in htdig 3.1.2 you can set "max_doc_size: xxxxx" where xxxx contains
a value that must be bigger as your biggest page.

if i wrong in some points, pls correct me!

Joerg Behrens

PGP-Get the Key!
Key fingerprint = 92 7D E0 A6 CF AE EC 32 14 28 EF 0D 57 2A 88 5B
Preussag Noell Dienstleistungs GmbH
D-97080 Wuerzburg
Alfred-Nobel-Straße 20 Tel: +49 931 903-2243
Abt: DV-C/tr Fax: +49 931 903-2051

To unsubscribe from the htdig mailing list, send a message to containing the single word unsubscribe in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Tue Sep 28 1999 - 09:15:22 PDT