Re: [htdig] Tuning search performance.


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 19 Mar 1999 15:44:00 -0600 (CST)


According to Barry Zubel:
> I have an (almost) standard installation of htdig - I have changed two
> variables: max_head_length which is 150000 and max_doc_size which is also
> 150000 - this is because some of the index files that point to the data are
> around 150k (the data files themselves tend to sit at around 8k max) so if
> there is any way of reducing this, but still let htdig find all of the
> linked files, then this would probably be a start.
..
> Currently, searches take up to 25-30 seconds to perform, and I was wondering
> whether there is any way to tune this to improve performance somewhat. I'm
> willing to trade off some functionality, but I simply don't know where to
> start!
>
> Answers on a postcard, (well by email would be nice too :))

I'll repeat what I told Michael yesterday, as your problem seems quite
similar.

If you're running it with no customization, on a large database, my first
guess would be that htsearch is getting a lot of hits, and taking a long
time to process those hits.

Some users with large databases have reported much higher performance, for
searches that yield lots of hits, by setting the backlink_factor attribute
in htdig.conf to 0, and sorting by score. The scores calculated this
way aren't quite as good, but htsearch can process hits much faster when
it doesn't need to look up the docdb record for each hit, just to get the
backlink count, date or title, either for scoring or for sorting.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Mar 19 1999 - 17:32:55 PST