Re: [htdig] modification_time_is_now again


Subject: Re: [htdig] modification_time_is_now again
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Dec 03 1999 - 12:32:28 PST


At 2:15 PM -0600 12/3/99, Gilles Detillieux wrote:
>In the 3.2 development code, Geoff hacked it a bit so the initial hopcount
>field is set to 0, instead of -1, when DocumentRef and URLRef objects are
>first constructed. I don't know if that actually solves this problem or
>not, but in any case it doesn't get to the root of the problem: what is
>happening to those hopcounts in the first place?!

It's not really a hack. First off, a document will only have a
hopcount >= 0, so making it -1 doesn't make a lot of sense IMHO.
Furthermore, the database seemed to ignore the -1 listed for
documents that hadn't been retrieved yet and make up a number. (I kid
you not, but I can't remember the exact details. Try doing a dig with
a limited server_max_docs and then do an update...)

But there are simply a *lot* of issues with hopcounts in 3.1. The
biggest problem is that pages are not indexed by hopcount. On an
update dig, all the pages that were in the database already are put
into the queue in *alphabetical* order, ahead of any new pages. Since
the queue is not ordered by hopcount, it's very difficult to ensure
the hopcounts are accurate.

The indexing queue in 3.2 is based on hopcount--this guarantees that
the first time it comes to a page, that was the fastest way it could
get there. Furthermore, on updates, any new pages will fall into the
queue in the proper place.

I don't know whether this has any influence on the particular bug
mentioned, but suffice to say that fixing all the problems with
hopcount in 3.1 is not going to happen--it would require backporting
too much code. I'll stick by the documentation: using -h or
max_hop_count is *only* reliable when you're doing an initial dig.
Other results may vary.

-Geoff

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Dec 03 1999 - 12:46:46 PST