Re: [htdig3-dev] Re: Multiple database (patch)


Subject: Re: [htdig3-dev] Re: Multiple database (patch)
From: loic@ceic.com
Date: Thu Feb 10 2000 - 03:26:28 PST


Geoff Hutchison writes:
> At 3:02 PM -0500 2/9/00, Rajendra Inamdar wrote:
> >Incidently, when I moved to 3.2.x, my searches using "substring" algorithm
> >seem to be running slower than with 3.1.3. Has anybody had similar
> >experience?
>
> I'm not surprised. The new format of the word database (i.e. *every*
> word in every document is stored) means the substring algorithm is
> going to generate a very large number of possible matches. I have
> some suggestions on how to improve the speed of this algorithm using
> trigrams, but I don't think I'll have time to work on it for a while.
>

 There might be a better solution. The indexer implements storing of
words frequency. I did not activate it by default since it is not used
by the code. But if it's activated, a list of unique words is maintained
in the index. I use this a lot in a context other than htdig so I'm really
sure it works well. But it takes a bit more space, of course.
 The 'substring' search could browse this list instead of the complete index
and that would give a list of candidates much more quickly.

 To activate the unique word frequency storage just set
wordlist_extended: true.

    Cheers,

-- 
		Loic Dachary

24 av Secretan 75019 Paris Tel: 33 1 42 45 09 16 e-mail: loic@dachary.org URL: http://www.senga.org/

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Feb 10 2000 - 03:26:32 PST