Re: [htdig] The 4:02 crash

Subject: Re: [htdig] The 4:02 crash
From: Vincent QUERU (
Date: Tue Sep 12 2000 - 09:14:05 PDT

>> By looking through Apache's access log I noticed the following 2 lines :
>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET /robots.txt HTTP/1.0" 404 278

>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET
>> /r2_admin/robot_init_page/?ht_dig_robot=1 HTTP/1.0" 401
>> 471
>> I did not do a "robots.txt" file as my server is the only one to index
>> the site.

>That's fine, but htdig will still fetch it. It's required to do so by 'net
>standards. It does this first off when it finds a server. I assume the
>next line is your start_url?

Okay, that is what I understood from your "A standard for Robot exclusion" page
but I thought that another server was trying to access my site.

And yes, the next line is my start_URL.

>> It looks as if there is some kind of automatic indexing (of course 4:02 is
>> nowhere to be found in my crontab)

>Well it has to be launched somehow, either from 'cron' or 'at' since htdig
>cannot launch by itself. What time is in your crontab?

Here is an extract from my crontab (for the root-user) :

35 9-18 * * 1-5 /root/bin/

>> that after it my db.wordlist file is
>> empty...

>And if you run the script yourself from the command-line it works fine?
>What cron program/version do you use?

The script works fine, even when with cron
(my version is (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie
Exp $)).

The robot stops on the first page : this must be due to authentication and as it
does not index any pages, my db.wordlist file is erased (I run htdig with the -i

In that case, why does it find the username and password that are in the script ???

By the way, I am sorry if I ask any stupid questions, I am not a very
experienced Linux user !!!

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
List archives: <>
FAQ: <>

This archive was generated by hypermail 2b28 : Tue Sep 12 2000 - 09:16:02 PDT