Re: [htdig] Duration of Htsearch Processing (3.1.5)


Subject: Re: [htdig] Duration of Htsearch Processing (3.1.5)
From: Mentos Hoffmann (htdig@web.de)
Date: Mon Mar 20 2000 - 01:33:11 PST


Hi,

> >Looking at documentation, it does not appear that there is any option in
> >either the conf file or the parameters passed to htsearch, to limit the
> >number of matches which are located and sorted. If "several thousand"
> >documents match the specified words, all of these have to participate in
> >sorting; there's no way to limit the number which participate.
>
> This has been requested in the past. The biggest problem is that it's
> a bit of a chicken-and-egg problem. You want to cut out the documents
> before scoring and sorting (preferably before even looking them up in
> the document DB). But before you have a ranking, you don't know which
> ones you want to cut exactly. After all, you don't want to cut out
> the best-ranked documents!
But for single word searches one could sort the documents by score at
the digging. The B+Tree retrieval method on the words database would
then yield very fast the best results. As Berkeley DB gives you the
possibility to define your own sorting criteria (just a function) this
should be fairly easy to implement. (One needs to define DUP and
DUP_SORT)
I am not quite sure how this would help for multiword searches.
Any thoughts about this?

Yours, mentos

--
Mentos Hoffmann, Roonstr.17, D-76137 Karlsruhe, Germany
email: htdig@web.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Mar 20 2000 - 00:37:19 PST