Subject: Re: [htdig] The 4:02 crash
From: Vincent QUERU (firstname.lastname@example.org)
Date: Tue Sep 12 2000 - 09:14:05 PDT
>> By looking through Apache's access log I noticed the following 2 lines :
>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET /robots.txt HTTP/1.0" 404 278
>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET
>> /r2_admin/robot_init_page/?ht_dig_robot=1 HTTP/1.0" 401
>> I did not do a "robots.txt" file as my server is the only one to index
>> the site.
>That's fine, but htdig will still fetch it. It's required to do so by 'net
>standards. It does this first off when it finds a server. I assume the
>next line is your start_url?
Okay, that is what I understood from your "A standard for Robot exclusion" page
but I thought that another server was trying to access my site.
And yes, the next line is my start_URL.
>> It looks as if there is some kind of automatic indexing (of course 4:02 is
>> nowhere to be found in my crontab)
>Well it has to be launched somehow, either from 'cron' or 'at' since htdig
>cannot launch by itself. What time is in your crontab?
Here is an extract from my crontab (for the root-user) :
35 9-18 * * 1-5 /root/bin/rundig.sh
>> that after it my db.wordlist file is
>And if you run the script yourself from the command-line it works fine?
>What cron program/version do you use?
The script works fine, even when with cron
(my version is (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie
The robot stops on the first page : this must be due to authentication and as it
does not index any pages, my db.wordlist file is erased (I run htdig with the -i
In that case, why does it find the username and password that are in the
rundig.sh script ???
By the way, I am sorry if I ask any stupid questions, I am not a very
experienced Linux user !!!
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
This archive was generated by hypermail 2b28 : Tue Sep 12 2000 - 09:16:02 PDT