Jeff Breidenbach (
Thu, 5 Nov 1998 16:39:35 -0500

Hi all,

I have a 25,000 page site indexed with htdig. I add one page to the
site. I can either rebuild the index from scratch, or merge the new
data into the database, depending on the command line options I feed
to htdig/htmerge.

Which choice is the winner in terms of time, peak memory usage, and
peak temporary disk space usage? htdig can have a voracious appetite,
so I want to be very careful, especially with disk and memory.


PS If anyone is interested, I did see a significant performance
increase when I upgraded from 3.0.8b2 to 3.1.0b2. I'm really, really
happy. I improved total indexing time from thirteen hours to ten
hours, noticibly improved search time, stopped clogging my httpd logs,
and got better search results due to paying attention to meta tags for
robots. This was on a Pentium 90 running linux, indexing 100,000 pages
from scratch into 125 different htdig archives.
