[htdig] Tuning search performance.


Barry Zubel (barry@citymutual.com)
Fri, 19 Mar 1999 09:35:24 -0000


I have an (almost) standard installation of htdig - I have changed two
variables: max_head_length which is 150000 and max_doc_size which is also
150000 - this is because some of the index files that point to the data are
around 150k (the data files themselves tend to sit at around 8k max) so if
there is any way of reducing this, but still let htdig find all of the
linked files, then this would probably be a start.

I have indexed around 97k files, total size is 787MB. Here is the contents
of the db dir:

drwxrwxr-x 2 root root 1024 Mar 18 17:14 .
drwxr-xr-x 18 root root 1024 Mar 18 14:02 ..
-rw-rw-r-- 1 root root 348661760 Mar 18 17:31 db.docdb
-rw-rw-r-- 1 root root 11640832 Mar 18 17:31 db.docs.index
-rw-rw-r-- 1 root root 4623360 Mar 18 17:47 db.metaphone.db
-rw-rw-r-- 1 root root 3689472 Mar 18 17:47 db.soundex.db
-rw-rw-r-- 1 root root 376777998 Mar 18 17:14 db.wordlist
-rw-rw-r-- 1 root root 291355648 Mar 18 17:14 db.words.db

As you can see, this is quite large :)

Currently, searches take up to 25-30 seconds to perform, and I was wondering
whether there is any way to tune this to improve performance somewhat. I'm
willing to trade off some functionality, but I simply don't know where to
start!

Answers on a postcard, (well by email would be nice too :))

Barry Zubel
Technical Manager
City Mutual Ltd
www.citymutual.com

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Mar 19 1999 - 17:32:54 PST