Re: [htdig] Introductory questions

Subject: Re: [htdig] Introductory questions
From: Bill Carlson (
Date: Fri Mar 17 2000 - 07:16:11 PST

On Thu, 16 Mar 2000, Gary Day wrote:

> I almost never join a list and immediately send a question without first
> listening in a bit but I need some information.
> I've installed htdig on a RedHat 6.0 Linux server. Everything runs just
> dandy (one of the simplest compiles and installs I've ever done).
> I've read the docs.
> 1. In your experience, how scalable is htdig? I'm just using it to
> prototype a "community" search engine now so it will be fine for now, but
> will it scale to 5000 sites if I have the disk space? So far, it looks
> like it should but the digging time may be a while.
> 2. It looks like it is clearly possible to just reindex one site without
> all the rest. Is that correct? Currently when I do it, no matter what is
> in my config file, it at least confirms all the existing sites/urls as
> well as the one in the config file.

Hi Gary,

I'll bite on this one. How well it scales depends more on how many
documents rather than sites. For example, my site has something like
25,000 pages and htdig does a great job. I know others are indexing much
more than that, I don't know what kind of hardware they are using.

When indexing, it is possible to merge separate digs into one large
database. It's all a matter of planning and reading the fine print in the
documentation (which is excellent).

Building a scaling solution is always very iffy and challenging, but I
think with htdig you've got a great start.

Bill Carlson
Systems Programmer | Opinions are mine,
Virtual Hospital | not my employer's.
University of Iowa Hospitals and Clinics |

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Mar 17 2000 - 06:14:33 PST