Subject: Re: [htdig3-dev] Creating a SQL backend...
From: Geoff Hutchison (firstname.lastname@example.org)
Date: Wed Jun 28 2000 - 08:31:42 PDT
On Wed, 28 Jun 2000, Bill Carlson wrote:
> What is really needed (and has been discussed before) are tools to "dump
> and reload" the ht://Dig databases, so that the indexes can be easily
Well, this is implemented as of 3.2.0b2. It's more reliable as of the
current snapshots since a few bugs were fixed.
In an ideal situation, this also allows you to change database formats (if
you wanted to use SQL or some fancy new research database code or
something) or in case ht://Dig 3.3 introduces a slightly different
> Granted, if one has Oracle laying around and enough hardware to support
> it, that might seem like a better solution, but I'd still bet on the
> current setup.
This is why I'd ultimately rather give choice. Some people *do* have
Oracle and big iron. I also know a few people who legitimately want to
index millions of URLs and have the hardware to back it up. Others of us
have more modest goals and budgets. :-) It's always been a big concern to
keep the minimum amount of disk space down.
> documents would have anywhere from 80GB to 110GB databases, throw them on
> a couple of x86s with 1GB of RAM and RAID0 over 4 40GB IDE drives (see
> www.3ware.com for cards that will do that!) you could serve several
RAID over IDE? I'd personally go with SCSI drives, but I digress.
> > The problem here that comes is: htdig create different databases that
> > htmerge merges, eliminating the identical documents... this is a good
> > process, but as said make the parallelizing unrealizeble...
> bandwidth than horsepower on the machine. In the search farm setup, one
> machine would be doing the index/merge and when complete push the updated
> databases out to the rest of the farm.
I'll agree that digging is limited significantly by bandwidth issues. It's
an easy proof--just turn on local_urls if you can and watch the digging
However, I'd love to see the "flow" be much simpler. Ideally, the digging
and searching could be performed at the same time. Then you'd only need
one copy of your databases and you could keep the digging going much more
frequently if you desired. You could still do a dig-and-update cycle if
you wanted, or you could do this. There are benefits and drawbacks to both
(For example, if you only have one copy of your databases and something
gets corrupted, you're dead for a good long time.)
To unsubscribe from the htdig3-dev mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Jun 28 2000 - 05:46:55 PDT