Re: htdig: Problems with using htdig -a


Joe R. Jah (jjah@cloud.ccsf.cc.ca.us)
Mon, 21 Sep 1998 22:50:59 -0700 (PDT)


On Mon, 21 Sep 1998, Geoff Hutchison wrote:

> Date: Mon, 21 Sep 1998 23:13:12 -0400
> From: Geoff Hutchison <Geoffrey.R.Hutchison@williams.edu>
> To: "Joe R. Jah" <jjah@cloud.ccsf.cc.ca.us>
> Cc: htdig@sdsu.edu
> Subject: Re: htdig: Problems with using htdig -a
>
> At 1:23 AM -0400 9/18/98, Joe R. Jah wrote:
> >I assume this increase in size of db files and theincrease in the reported
> >number of documents will be cumulative over time if one uses this
> >workaround; It will probably increase the actual search time as well;(
>
> I'm not sure what's going on here. Perhaps you could export the ASCII
> database for the db with and without this behavior. I'd be interested to
> see if documents are being duplicated. Do you use "remove_bad_urls"?

Yes documents are being duplicated, triplicated, and ... That's why I use
the old "Excluding directories and duplicate URLs patch."

Yes I have the line

        remove_bad_urls: true

in my htdig.conf file.

Joe

     _/ _/_/_/ _/ ____________ __o
     _/ _/ _/ _/ ______________ _-\<,_
 _/ _/ _/_/_/ _/ _/ ......(_)/ (_)
  _/_/ oe _/ _/. _/_/ ah jjah@cloud.ccsf.cc.ca.us

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:50 PST