Re: [htdig] searches timing out


Subject: Re: [htdig] searches timing out
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Aug 29 2000 - 11:12:38 PDT


On Tue, 29 Aug 2000, Clint Gilders wrote:

> I know I can limit the number of pages htsearch returns, but is there a
> way to limit the actual number of matches so htsearch doesn't spend all
> day searching?

You *can* limit the number of pages htsearch returns, but that's not
really your question.

Think about it this way--in order to limit the number of matches and still
return the highest N matches, you'd still need to score them all*. That's
not to say we aren't working on ways of speeding up htsearch, just that
this request would be essentially impossible.

Suffice to say, if you're looking to improve search performance (in
3.1.x):
1) Beef up your hardware.
2) Put in as many common words as possible into the bad_words file
  (these will be ignored in searches rather than returning a huge number
   of hits)
3) Limit searching to sort=score, and set backlink_factor and date_factor
to 0.

*Well, not quite. There's all sorts of research on "probabilistic" methods
of estimating scores. But it's not clear that these will be faster than
various other optimizations (e.g. caching results, smarter sorting, etc.).

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Aug 29 2000 - 11:15:12 PDT