Re: [htdig] modification_time_is_now again


Subject: Re: [htdig] modification_time_is_now again
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Nov 30 1999 - 20:02:02 PST


At 1:26 AM +0100 12/1/99, Giancarlo Pinerolo wrote:
>I don't understand why it says 'it will cut down on reindexing from such
>servers when doing updates'.

This was written by the author of the patch. Personally, I'm thinking
about making modification_time_is_now true and removing it as an
option. If the server doesn't return a Last-Modified: header, there's
know way of knowing what the "correct date" should be and the current
time at indexing is a good guess. Besides, this most often happens
with dynamically generated content, for which the current time *is*
the correct date.

>EG
>1) a doc has 'last modified' unknown (which, as I recall from a previous
>post, means actually 0)
>2) this, on the first run, gets changer to now (lets say 30/11/1999
>00.00)
>3) the next runthe same doc will return 0 again
>4) then what happens? will it
...
>b) transform 0 to now again (let's say 01/12/1999) and reindex it?

b. The only way I can see it not being reindexed is if the server
accepts the Last-Modified header and doesn't send the document back.
Caveat: This is actually what happens in a specific case and is the
reason the option is in there. If you're indexing from a cache
(specifically WWWWoffle), it will see that the date you sent matches
the date it has in cache and not bother to d/l or send it on to htdig.

>Then I really think I got a bug when running an update with mod_t_is_now
>false over a base db that has been digged with m_t_i_n true :
>
>in this case the max_hop count is completely unrespected and a 9999 dig
>starts.
>
>If this bug is true (in which case I bet you'd immediatly halt the
>unwanted 9999 dig, and restart it with m_t_i_n true) then any doc that
>doesn't return a mod_t will never have a chance to be reindexed again.

You're correct that there would be a bug. The pages would not be
indexed since 0 is going to be smaller than whatever date is in
there. However, i don't see how this affects max_hop_count, or why
this would never reindex these documents again. You'd simply set
m_t_i_n to true again and they'd be fine.

Am I missing something?

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 20:30:34 PST