Re: [htdig3-dev] Document count in database(previuosly reported bug)


Subject: Re: [htdig3-dev] Document count in database(previuosly reported bug)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Mar 20 2000 - 21:32:25 PST


At 11:26 AM +0200 3/6/00, Valdas Andrulis wrote:
>logs are attached.

Thanks very much. I didn't have much time to look at your logs until
now. Ooo, that's a nasty bug! So far you're finding the best ones. :-)

Here's the problem. You reindex and it deletes the old document. So
far, so good. Except when it goes to delete the entry in
db.docs.index, it's keyed by URL. So it deletes the new entry (see
below). Whoops! Now on another reindexing run, it can't find an entry
for that URL in the db.docs.index and grabs it again, completely
ignoring the entry in the actual database. Voila! Duplicate records.

The reason it deletes the new entry in db.docs.index is because it's
a DB_HASH. This doesn't support duplicates, so when you've added in
the entry for the new document, you've overwritten the previous entry
already.

In short, when we do a delete, we need to check to make sure the
URL->ID pair is the same as the one we're removing!

Thanks very much Valdas!
-Geoff



------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Mar 21 2000 - 05:05:14 PST