Subject: Re: [htdig] Going for the big dig
From: David Gewirtz (david@ZATZ.com)
Date: Tue Dec 19 2000 - 09:47:45 PST
Thanks to some of the answers to my question below. But I'm still not clear
on something. I attempted to index a remote site, in this case Lotus.com.
Now, I have no idea how many pages that is. But I let the index process run
for three days and by the end of three days, Linux was page-swapping like a
banshee and was becoming substantially unresponsive. Given that that was
only one site, and I'm thinking about indexing a lot more, I've been trying
to figure out what I need to do to make the hardware/software able to
handle it. Right now, I'm thinking the process is too big. Can htdig and/or
htmerge running on a 258MB or 384MB machine handle indexing/merging sites
like lotus.com or other large sites, or is this beyond the scope of this
tool? And, if we don't know the size of external sites, how can I go about
thinking through this issue?
>So, I'm finishing up pre-deployment testing and I seem to have run into
>limits of the system. I'm running htdig on a 256MB PIII, Mandrake 7.2
>system. When I just index our own sites, digging is fast and the system
>seems quite responsive. But, ideally, I'd like to dig 40-60 sites per topic
>(say, Lotus Domino sites) and then maybe 3 or more topics. But it seems
>that although this box has a large amount of RAM (it maxes at 384M) and a
>40GB disk, the digging process is just too memory intensive and eveything
>slows down to a crawl.
>So, here's question: can I index large sites (like, say, lotus.com)? Or are
>we just going to run into machine limits and I'm best off using htdig for
>my own sites and leave the dream of indexing outside sites to a later
>If I'm missing something, or their's an ideal configuration for attempting
>this approach, please enlighten me.
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
This archive was generated by hypermail 2b28 : Tue Dec 19 2000 - 09:54:49 PST