RE: [htdig] indexing

Subject: RE: [htdig] indexing
From: Geoff Hutchison (
Date: Fri Jan 07 2000 - 11:12:45 PST

At 11:01 AM -0800 1/7/00, David Schwartz wrote:
> The 'htdig' process consumes more and more memory as it runs.
>This might be
>due to memory leaks, or it might be legimitately due to it keeping track of
>all the URLs it has to process. I tried htdigging 250,000 documents and hit
>about 180Mb.

At this point (3.1.4), there do not seem to be memory leaks
left--obviously if someone finds any with Purify, we'd fix them.

However, as you point out, it has to keep track of all of the URLs,
especially it's "todo list" which can get quite large. This is why
you can quickly run out of memory if you have a large number of
initial URLs rather than interconnected documents.

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Jan 07 2000 - 11:46:10 PST