Re: [htdig] Performance of site being scanned


Geoff Hutchison (ghutchis@wso.williams.edu)
Mon, 09 Aug 1999 12:46:23 -0400


Charlie Ruble wrote:
> it is digging a site? Does it open tons of db connections?

ht://Dig uses the Berkeley database as it's backend. It doesn't make "db
connections" because the db live on the same machine as the searching
and indexing. For large databases, I/O can be a problem, but this isn't
specific to ht://Dig.

> Does it scan each link more than once?

It shouldn't. It keeps a list of the URLs it has visited. Of course if a
document has multiple (duplcate) URLs, then it will currently visit that
document for each of the URLs.

> How often does it update itself to account for changes to sites?

That depends on how often you run it. I think most people have a script
like rundig that they use through cron. Personally, I update once per
day, though it would be easy to change that to update every hour or
whatever. Clearly you should make sure that it's finished with one
update before it starts another.

-- 
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Aug 09 1999 - 09:46:52 PDT