Re: [htdig] Going for the big dig


Subject: Re: [htdig] Going for the big dig
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Dec 19 2000 - 10:02:19 PST


On Tue, 19 Dec 2000, David Gewirtz wrote:

> on something. I attempted to index a remote site, in this case Lotus.com.
> Now, I have no idea how many pages that is. But I let the index process run

If you have no idea how many pages will be on a server, I'd start with a
set max_hop_count or server_max_docs limit and go from there. These
attributes are meant to keep the dig from spiralling out of control (or in
this case, out of the limits of your server).

<http://www.htdig.org/attrs.html#max_hop_count>
<http://www.htdig.org/attrs.html#server_max_docs>

> handle it. Right now, I'm thinking the process is too big. Can htdig and/or
> htmerge running on a 258MB or 384MB machine handle indexing/merging sites

This question is a bit hard to answer. From what you said, the answer is
"no," but I can't give a better answer unless there's at least an estimate
of the number of URLs, as I mentioned earlier.

There are also simple "link checker" scripts which can give you a count of
the number of URLs on a site.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Dec 19 2000 - 10:13:05 PST