Re: [htdig] compression, multiple database performance


Geoff Hutchison (ghutchis@wso.williams.edu)
Sun, 14 Feb 1999 16:30:59 -0400


> get a feel for when, where and how much (as I decide whether to
> deploy compression on some large htdig databases)

The compression_level attribute uses the zlib compression library. First
off a caveat, if that library isn't available on your system,
compression_level won't work.

Think about using gzip on your databases. You have a range of 1-9, where 1
is faster but larger, and 9 is slower but smaller, usually. But the
tradeoffs aren't easy to determine because it's not very predictable.

However, you should think about those databases. Compression affects only
the db.docdb file, and only the excerpts. So if you have a lot of small
excerpts, it's not going to help much and may even hurt! But if you've been
indexing large text files, say an electronic library, your excerpts will
probably get much smaller with compression. (I think I put this in the
documentation. I guess I'll check and put it in if I forgot.)

> that I can htmerge together. Is there any reason to choose one over
> the other?

Well there's the overhead of merging the databases together. That takes
some time, so merging a lot of smaller databases together might not be
worth it. However, you might be able to run several runs of htdig at the
same time if you use multiple databases. My guess is that the single server
case is probably best suited for a single db. Someone who's indexing a few
hundred servers would probably find splitting the dig up and merging later
to be much faster.

I guess it all goes to say experiment, experiment, experiment. ;-)

Cheers,

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Feb 17 1999 - 10:10:03 PST