Re: [htdig] another newbie :-) about files to search


Geoff Hutchison (ghutchis@wso.williams.edu)
Thu, 25 Feb 1999 15:38:31 -0500 (EST)


On Thu, 25 Feb 1999, Markus Eisele wrote:

> I am trying to translate htdig into german and while doing this I came
> across some other problems. I have got a server with about 15.000 .htm and
> .html files wich are not accually linked.

Aha. You didn't mention this when you asked me. What's happening is that
you're passing ht://Dig this list of 15,000 URLs. But when it gets that
list, it has to store the whole list in memory!

When htdig is running, it keeps a list of the URLs it needs to visit.
Usually it starts with a few starting URLs and then along the way it picks
up more. But as it's going, it visits URLs, so the list is never as large
as the total number of URLs. After all, once it's visited that URL, it
doesn't need to do so again! :-)

In your case, however, you've told it to visit all these files at the
beginning. So it has to put the whole list together, which happens in
memory.

In short, if you link the files, you can cut down on the memory use...
This could be as simple as spitting out the list into a few HTML files:

list1.html -> first 1000 URLs, plus link to list2.html
(This way it visits the first 1000 URLs before it realizes it has another
1000, and so on.)

Cheers,
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:13 PST