Re: [htdig] htdig 3.2b2 performance

Subject: Re: [htdig] htdig 3.2b2 performance
From: Geoff Hutchison (
Date: Mon Jun 12 2000 - 08:04:00 PDT

At 10:38 AM -0700 6/11/00, Ravindra Wankar wrote:
>Phrase match seems very very slow (as compared to "all words" and "any

Strange. I notice a small slowdown, but not much.

>Also, when running htdig, initially htdig takes up 97-98% of CPU time.
>Memory usage is high but I don't see swapping. After a while the cpu
>usage drops to around 40%. Mem is still fine.

Yes, the word database code still needs some optimization. Profiling
the code has shown that this is the major bottleneck. If you fiddle
with the cache size, performance improves, but it's silly to cache
the whole database. ;-)

>Similarly when htsearch is run I see almost 90-95% CPU usage. What
>happens if there are 10 simultaneous searches?

Right, but you see high CPU usage when you run htsearch in previous
versions too. Basically all of the programs are designed to run with
as much CPU as you give them... When I actually finish rewriting the
htsearch backend rewrite, it will be possible to cache search results
and intermediate results (i.e. part of a query). You *could* do it
now, but the code would be a total mess.

>Would moving to MYSQL DB help? I don't see a patch for 3.2 versions.

Not really. A SQL database might help speed up the document indexes
slightly, but the word database in SQL would be massive. So you may
or may not have a performance increase for the word database, but I'm
very confident you'd have a much bigger database.

>Does anyone know what is/are the bottlenecks? Disk/Mem/CPU? e.g. given
>the above configuration, what can be changed to speed things up?

You will get better disk performance if you use a SCSI disk. This is
a significant bottleneck however you cut it and will probably remain
one. The fewer times you need to hit the disk, the better.

-Geoff Hutchison
Williams Students Online

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Mon Jun 12 2000 - 05:56:16 PDT