Subject: Re: [htdig] Introductory questions
Date: Fri Mar 17 2000 - 07:46:29 PST
Agree with Carlson's reply, but would add one comment. To re-index a single
site, you'll probably be happier using a set of files containing only that
site's content. Then, schedule merging of the single-site files with your
You also need to realize that any merge involving your "master" set will be
time-consuming; due to all the sorting and rebuilding of word lists and
related indexes. If you're dealing with thousands of sites, you may want to
set up some form of heirarchy; to keep re-building of the master to a
In a message dated 3/17/00 8:18:30 AM US Mountain Standard Time,
<< On Thu, 16 Mar 2000, Gary Day wrote:
> I almost never join a list and immediately send a question without first
> listening in a bit but I need some information.
> I've installed htdig on a RedHat 6.0 Linux server. Everything runs just
> dandy (one of the simplest compiles and installs I've ever done).
> I've read the docs.
> 1. In your experience, how scalable is htdig? I'm just using it to
> prototype a "community" search engine now so it will be fine for now, but
> will it scale to 5000 sites if I have the disk space? So far, it looks
> like it should but the digging time may be a while.
> 2. It looks like it is clearly possible to just reindex one site without
> all the rest. Is that correct? Currently when I do it, no matter what is
> in my config file, it at least confirms all the existing sites/urls as
> well as the one in the config file.
I'll bite on this one. How well it scales depends more on how many
documents rather than sites. For example, my site has something like
25,000 pages and htdig does a great job. I know others are indexing much
more than that, I don't know what kind of hardware they are using.
When indexing, it is possible to merge separate digs into one large
database. It's all a matter of planning and reading the fine print in the
documentation (which is excellent).
Building a scaling solution is always very iffy and challenging, but I
think with htdig you've got a great start.
Bill Carlson >>
Steven P Haver
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Mar 17 2000 - 06:45:24 PST