Re: [htdig] The 4:02 crash


Subject: Re: [htdig] The 4:02 crash
From: Vincent QUERU (vqueru@free.fr)
Date: Tue Sep 12 2000 - 09:14:05 PDT


>> By looking through Apache's access log I noticed the following 2 lines :
>>
>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET /robots.txt HTTP/1.0" 404 278

>> my_server - - [12/Sep/2000:04:02:00 +0200] "GET
>> /r2_admin/robot_init_page/?ht_dig_robot=1 HTTP/1.0" 401
>> 471
>>
>> I did not do a "robots.txt" file as my server is the only one to index
>> the site.

>That's fine, but htdig will still fetch it. It's required to do so by 'net
>standards. It does this first off when it finds a server. I assume the
>next line is your start_url?

Okay, that is what I understood from your "A standard for Robot exclusion" page
but I thought that another server was trying to access my site.

And yes, the next line is my start_URL.

>> It looks as if there is some kind of automatic indexing (of course 4:02 is
>> nowhere to be found in my crontab)

>Well it has to be launched somehow, either from 'cron' or 'at' since htdig
>cannot launch by itself. What time is in your crontab?

Here is an extract from my crontab (for the root-user) :

35 9-18 * * 1-5 /root/bin/rundig.sh

>> that after it my db.wordlist file is
>> empty...

>And if you run the script yourself from the command-line it works fine?
>What cron program/version do you use?

The script works fine, even when with cron
(my version is (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie
Exp $)).

The robot stops on the first page : this must be due to authentication and as it
does not index any pages, my db.wordlist file is erased (I run htdig with the -i
option).

In that case, why does it find the username and password that are in the
rundig.sh script ???

By the way, I am sorry if I ask any stupid questions, I am not a very
experienced Linux user !!!

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Sep 12 2000 - 09:16:02 PDT