Re: [htdig] Performance of site being scanned

Geoff Hutchison (
Mon, 09 Aug 1999 12:46:23 -0400

Charlie Ruble wrote:
> it is digging a site? Does it open tons of db connections?

ht://Dig uses the Berkeley database as it's backend. It doesn't make "db
connections" because the db live on the same machine as the searching
and indexing. For large databases, I/O can be a problem, but this isn't
specific to ht://Dig.

> Does it scan each link more than once?

It shouldn't. It keeps a list of the URLs it has visited. Of course if a
document has multiple (duplcate) URLs, then it will currently visit that
document for each of the URLs.

> How often does it update itself to account for changes to sites?

That depends on how often you run it. I think most people have a script
like rundig that they use through cron. Personally, I update once per
day, though it would be easy to change that to update every hour or
whatever. Clearly you should make sure that it's finished with one
update before it starts another.

-Geoff Hutchison
Williams Students Online

