Subject: Re: [htdig] modification_time_is_now again and the 'unwanted 9999 dig' bug
From: Giancarlo Pinerolo (firstname.lastname@example.org)
Date: Wed Dec 01 1999 - 03:09:10 PST
> I don't understand why it says 'it will cut down on reindexing from such
> servers when doing updates'.
> 1) a doc has 'last modified' unknown (which, as I recall from a previous
> post, means actually 0)
> 2) this, on the first run, gets changer to now (lets say 30/11/1999
> 3) the next runthe same doc will return 0 again
> 4) then what happens? will it
> a) compare 0 to 30/11 and decide that it has not been changed?
> b) transform 0 to now again (let's say 01/12/1999) and reindex it?
> >From that phrase in the doc I guess the first, isn't it?
> b. The only way I can see it not being reindexed is if the server
> accepts the Last-Modified header and doesn't send the document back.
> Caveat: This is actually what happens in a specific case and is the
> reason the option is in there.
Someone pointed out that pages that do not return a mod_t are mostly
So it seems logic that assigning them a mod_t = now will force a reindex
anyway, but that phrase ('cutting on reindexing') made some confusion...
> If you're indexing from a cache
> (specifically WWWWoffle), it will see that the date you sent matches
> the date it has in cache and not bother to d/l or send it on to htdig.
No. It's real world indexing.
All $start_url are singly selected ones (max_hop_count: 0). No digging
at all is wanted. Nevertheless I assure that a 9999 dig starts.
(anyway wwwoffle seems to preserve the original doc's mod_t)
I think the 'unwanted 9999 dig' bug is a real one, and I jus made a test
to prove it, you can try it too:
1) initial dig:
------ commands to execute
/usr/sbin/htdig -v -s -t -i -l -h0 -c htdig.conf>log
/usr/sbin/htmerge -vv -s -c htdig.conf>>log
-This will correctly index only the start page of yahoo
2) update dig
modification_time_is_now: false ### only difference
/usr/sbin/htdig -v -s -t -l -h0 -c htdig-u.conf>>log
-This unchains the 'unwanted 9999 dig' on the whole yahoo site :-(
Maybe I'missing something though...
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b25 : Wed Dec 01 1999 - 03:12:30 PST