[htdig3-dev] Re: [htdig3-dev] Re: [htdig3-dev] StringMatch and duplicate documents


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 20 Jan 1999 11:13:53 -0500 (EST)


* List: htdig3-dev@sob.htdig.org

On Wed, 20 Jan 1999, Gilles Detillieux wrote:

> > While I doubt there are any duplicate documents in the dbs after htmerge,
> > there seem to be *missing* documents. Is anyone else concerned about the
> > huge difference between htdig and htmerge?
>
> Huston, we have a problem... :) Did you try the StringMatch patches in
> isolation? I'm wondering if the first or second patch is the problem, or
> both.

Alas, I tried them at the same time--I'm running the current CVS tree.
I'm going to start debugging by running just htdig, which returned a
number of documents in the right ballpark (I know I have around 50,000
webpages based on link checking.)

Then I'm going to take a look at the db and put some debugging code into
htmerge.

Has anyone else noticed missing pages?

-Geoff



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:19 PST