Re: [htdig] update digging


Geoff Hutchison (ghutchis@wso.williams.edu)
Tue, 2 Mar 1999 11:29:19 -0500 (EST)


On Tue, 2 Mar 1999, Frank Guangxin Liu wrote:

> Is it true that if I run htdig to update my db (without -i option),
> htdig will ignore "limit_urls_to" and try all the urls in the db?

Yes, I thought I already answered this question from you.
I don't know that I consider this a bug. It "updates" the db by checking
all the URLs in the database. Basically, it generates a list of all the
URLs in the DB and then checks them for changes.

> only several html files and found GET for all files although
> there has been no change between the initial htdig and the update
> htdig for this small www server).

On an update dig, ht://Dig sends an If-Modified-Since header. It does this
as a GET because it wants to make one connection. The HTTP specification
says that this header will return the document if it's modified and an
error code (I forget off the top of my head) if it's not modified.

I've noticed this header does not always work. However, in version 3.1.0
and on, after recieving the data, htdig checks the date in the header
before parsing it. So even if the server incorrectly sends the data,
ht://Dig won't bother continuing.

> Another strange thing is that although I deleted some html files
> on the server http://www3.mydept.mycompany.com BEFORE the update
> run of htdig, those deleted url still left in the db. A subsequent

You have an old version of ht://Dig. This was a bug fixed in version
3.1.0.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:18 PST