Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 20 Jan 1999 18:59:06 -0400
* List: htdig3-dev@sob.htdig.org
At 6:11 PM -0400 1/20/99, Gilles Detillieux wrote:
>A few trace prints in htmerge/docs.cc revealed the source of the 9 extra
>documents. These were 9 documents that were disallowed by robots.txt,
>which were deleted from the DB, because they had no DocHead, but because
>of a missing "else", they were still indexed and counted. Here's the fix:
I don't know if I believe it. That seemed to do it... After patching,
recompiling and re-running htmerge, I get:
htmerge: Total documents: 58193
htmerge: Total doc db size (in K): 330586
No complaints here. Leo, are you still seeing duplicate URLs?
-Geoff
This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST