Re: [htdig] htdig update is checking ALL pages already in a DB


Geoff Hutchison (ghutchis@wso.williams.edu)
Thu, 4 Feb 1999 18:51:00 -0400


At 4:39 PM -0400 2/4/99, denis filipetti wrote:
>limit_urls_to. Is that correct ? We will need to update any given page at

Yes this is correct. It will read in all the URLs in the old database and
use those as pages to check.

>certain times for our users, in a DB that would be time consuming and
>unnecessary to totally reindex. Is there any way that I can do that ?

I'm not so sure it's "time consuming," but I guess it depends on how
frequent you're talking about updating and how many URLs you have. Update
digs for me, on 75,000 URLs take a total of about 35 min (including
htmerge).

You can always try digging the certain pages with a separate config file
and using the new merge feature (in the snapshots or the
hopefully-soon-to-be-released 3.1.0) to merge that database into the main
one. I can't promise anything as far as speed because it still has to check
the databases...

>"not in the limits" but at other times "GET"ing that same URL (in the same
>run) ! I suspect this dove-tails nicely with the previous question !

There are some bugs in the string matching code in 3.1.0b4 and previous
versions. As far as we know, all of them have been fixed in the current
development source.

Cheers,

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Feb 10 1999 - 17:09:05 PST