Re: [htdig] update digging


Patrick Dugal (patrick.dugal@nrc.ca)
Wed, 03 Mar 1999 11:09:07 -0500


Hi Geoff,

What's the best way to configure an auto-digging process if the total of
ht://Dig databases is near 2GB, with over 120,000 documents on more than 75
servers? Where's the best place to read more about auto-digging? Any ideas?

Pat :)

Geoff Hutchison wrote:

> On Tue, 2 Mar 1999, Frank Guangxin Liu wrote:
>
> > Will htdig follow those new URLs though they are not in the original
> > db file?
>
> Yes! It uses the old database to speed up reindexing. It checks the dates
> in the database so that it can skip as much work as possible. :-) As I
> said earlier, it tries not to download documents already in the database.
> And if the server sends it and it hasn't changed, it won't bother parsing
> it.
>
> But if it HAS changed, it goes about its normal business. It will re-parse
> the document, add the URLs to the list to be checked. So new URLs will be
> added to the database.
>
> > Yes, it can find new URLs, but will it follow those URLs and add
> > the new stuff in the db?
>
> Yup. The point of "update" digs isn't to only ensure the docs in the db
> are up to date. The point is to speed up the indexing. If you already have
> the information, why bother to collect it again! :-)
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig@htdig.org containing the single word "unsubscribe" in
> the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:19 PST