Re: [htdig] Going for the big dig


Subject: Re: [htdig] Going for the big dig
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Dec 18 2000 - 15:28:11 PST


On Mon, 18 Dec 2000, David Gewirtz wrote:

> 40GB disk, the digging process is just too memory intensive and eveything
> slows down to a crawl.
>
> So, here's question: can I index large sites (like, say, lotus.com)? Or are
> we just going to run into machine limits and I'm best off using htdig for
> my own sites and leave the dream of indexing outside sites to a later project?

I see these as two separate issues/questions:

1) Is ht://Dig able to efficiently index external sites?
  Depends mostly on the response of the site and the speed of your network
connection! It's also usually a very good idea to be a good neighbor and
ask the sysadmin of the box as far as typical low-usage periods, set
server_wait_time high, etc.

2) Is ht://Dig able to handle large databases?

Sure, if you have the hardware. I'm not quite sure why you say digging is
"memory intensive"--I usually found the worst part was in the merging
phase. And it depends on your definition of "large" too. Some people on
this list ask if 100,000 URLs is large, but I'd consider that to be fairly
medium-size from some of the reports I see. But I wouldn't go try to index
a million+ URLs across multiple hosts without some careful checks.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Mon Dec 18 2000 - 15:39:00 PST