Re: [htdig] htdig update is checking ALL pages already in a DB

Geoff Hutchison (
Thu, 4 Feb 1999 18:51:00 -0400

At 4:39 PM -0400 2/4/99, denis filipetti wrote:
>limit_urls_to. Is that correct ? We will need to update any given page at

Yes this is correct. It will read in all the URLs in the old database and
use those as pages to check.

>certain times for our users, in a DB that would be time consuming and
>unnecessary to totally reindex. Is there any way that I can do that ?

I'm not so sure it's "time consuming," but I guess it depends on how
frequent you're talking about updating and how many URLs you have. Update
digs for me, on 75,000 URLs take a total of about 35 min (including

You can always try digging the certain pages with a separate config file
and using the new merge feature (in the snapshots or the
hopefully-soon-to-be-released 3.1.0) to merge that database into the main
one. I can't promise anything as far as speed because it still has to check
the databases...

>"not in the limits" but at other times "GET"ing that same URL (in the same
>run) ! I suspect this dove-tails nicely with the previous question !

There are some bugs in the string matching code in 3.1.0b4 and previous
versions. As far as we know, all of them have been fixed in the current
development source.


-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Wed Feb 10 1999 - 17:09:05 PST