Re: [htdig] Duration of Htsearch Processing (3.1.5)


Subject: Re: [htdig] Duration of Htsearch Processing (3.1.5)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Mar 20 2000 - 07:12:48 PST


At 10:33 AM +0100 3/20/00, Mentos Hoffmann wrote:
>I am not quite sure how this would help for multiword searches.
>Any thoughts about this?

Well the crux of the problem is this:

Not only do you have to do scoring, but you have to perform some
filtering on the results (e.g. restrict, exclude, occasionally
sorting by title or date) and you don't know how many these will
remove from the batch.

Multiword searches are also very complicated because the scoring may
not be the same on all words. And to top it off, in version 3.2, the
words aren't scored until search-time.

However, in 3.2, you know how many word matches per document before
scoring. So you can estimate the score by the number of matches and
go from there, scoring after doing a multiword comparison. (Actually,
multiword searches could be faster too, but that's a bit offtopic.)

In short, there are a variety of problems, but if anyone wants to
help me solve them, htsearch performance can definitely be improved.

-Geoff

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Mar 20 2000 - 06:20:36 PST