RE: [htdig] indexing


Subject: RE: [htdig] indexing
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Jan 11 2000 - 21:12:36 PST


At 11:43 AM -0800 1/7/00, David Schwartz wrote:
> If it really is the URLs eating memory, perhaps we need a
>patch to allow
>the URLs to be swept to be stored in a different way (perhaps each depth
>should write the URLs for the next greater 'depth' into a file?). It'd be
>very convenient for me to be able to dig 400,000 URLs in a pass.

Yes, but I'm pretty confident you'd be upset with the performance.
Remember that it's not like it can just decide a URL is relatively
unimportant. It needs to know what URLs are already visited as well
as those already in the queue. So if it writes out part of the URL
list to disk, it'll have to check the disk file for every new link it
comes across.

If someone has a great idea for getting around this, I'm all ears.

> If it's not the URLs, what is it?

Hey, you're taking my question! ;-)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Jan 11 2000 - 21:33:36 PST