[htdig3-dev] Re: [htdig3-dev] Re: StringMatch and duplicate documents


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 20 Jan 1999 18:59:06 -0400


* List: htdig3-dev@sob.htdig.org

At 6:11 PM -0400 1/20/99, Gilles Detillieux wrote:

>A few trace prints in htmerge/docs.cc revealed the source of the 9 extra
>documents. These were 9 documents that were disallowed by robots.txt,
>which were deleted from the DB, because they had no DocHead, but because
>of a missing "else", they were still indexed and counted. Here's the fix:

I don't know if I believe it. That seemed to do it... After patching,
recompiling and re-running htmerge, I get:

htmerge: Total documents: 58193
htmerge: Total doc db size (in K): 330586

No complaints here. Leo, are you still seeing duplicate URLs?

-Geoff



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST