Re: [htdig] modification_time_is_now again


Subject: Re: [htdig] modification_time_is_now again
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri Dec 03 1999 - 13:14:35 PST


According to Geoff Hutchison:
> At 2:15 PM -0600 12/3/99, Gilles Detillieux wrote:
> >In the 3.2 development code, Geoff hacked it a bit so the initial hopcount
> >field is set to 0, instead of -1, when DocumentRef and URLRef objects are
> >first constructed. I don't know if that actually solves this problem or
> >not, but in any case it doesn't get to the root of the problem: what is
> >happening to those hopcounts in the first place?!
>
> It's not really a hack. First off, a document will only have a
> hopcount >= 0, so making it -1 doesn't make a lot of sense IMHO.
> Furthermore, the database seemed to ignore the -1 listed for
> documents that hadn't been retrieved yet and make up a number. (I kid
> you not, but I can't remember the exact details. Try doing a dig with
> a limited server_max_docs and then do an update...)

I know! That's what I was seeing myself. It was often -1, but sometimes
it was 255. This makes me wonder if it's not some odd bug in Serialize
or Deserialize. The wierd thing is when I look at the code, it always
seems to set the hopcount explicitly after creating a new DocumentRef
object, so I can't see why it would ever fall back to the constructor's
default value. The fact that it is makes me suspect that something
is going terribly wrong, and I think it's in the database. I called
changing the constructor's default a hack because it just conceals the
-1 that was staring at us before, which I see as an error indication.

> But there are simply a *lot* of issues with hopcounts in 3.1. The
> biggest problem is that pages are not indexed by hopcount. On an
> update dig, all the pages that were in the database already are put
> into the queue in *alphabetical* order, ahead of any new pages. Since
> the queue is not ordered by hopcount, it's very difficult to ensure
> the hopcounts are accurate.
>
> The indexing queue in 3.2 is based on hopcount--this guarantees that
> the first time it comes to a page, that was the fastest way it could
> get there. Furthermore, on updates, any new pages will fall into the
> queue in the proper place.
>
> I don't know whether this has any influence on the particular bug
> mentioned, but suffice to say that fixing all the problems with
> hopcount in 3.1 is not going to happen--it would require backporting
> too much code. I'll stick by the documentation: using -h or
> max_hop_count is *only* reliable when you're doing an initial dig.
> Other results may vary.

I realize that there are many other hopcount related changes in 3.2,
and no, I don't intend to backport them all, but the reason I'm caught
up on this particular problem is that it seems to me to be a symptom of
a deeper underlying problem. If I can rule out that it is, I'm OK with
leaving this as-is, but if I uncover something nastier, I'd see that as
reason enough to delay 3.1.4 for a day or two - if a solution is in sight.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Dec 03 1999 - 13:27:49 PST