[htdig3-dev] Re: [htdig3-dev] Duplicate entries in docs.index


Geoff Hutchison (ghutchis@wso.williams.edu)
Sat, 23 Jan 1999 12:56:22 -0400


* List: htdig3-dev@sob.htdig.org

At 7:07 AM -0400 1/22/99, Alexander Bergolth wrote:
>After the second run, I found one document from the first run in the
>docs.index file that was't removed correctly.
>
>The .docs file that htdig produces is OK, so htmerge must be the problem.

>P.S.: I did a third run without removing the databases using the first
>server again (having a smaller URL count than the second) and 340 of 411
>URLs remained from the previous run!

Thanks Leo, I think I have it nailed now. This is similar to the bug with
the db.words.db (the word version of docs.index) that we nailed for 3.1.0b3.

The fix is easy and it explains why I'm not seeing it. I do all of my digs
with -a and I never keep the db.docs.index.work file. So I essentially do
what you did during testing--I remove the file before doing a dig.

So that's the fix! We unlink the db.docs.index file before htmerge does
anything. This way we generate a clean version, free of duplicates. I'll
put it in the tree tonight. I bet it will be 1-2 lines to fix :-(.

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST