Subject: Re: [htdig3-dev] Re: Multiple database (patch)
Date: Thu Feb 10 2000 - 03:26:28 PST
Geoff Hutchison writes:
> At 3:02 PM -0500 2/9/00, Rajendra Inamdar wrote:
> >Incidently, when I moved to 3.2.x, my searches using "substring" algorithm
> >seem to be running slower than with 3.1.3. Has anybody had similar
> I'm not surprised. The new format of the word database (i.e. *every*
> word in every document is stored) means the substring algorithm is
> going to generate a very large number of possible matches. I have
> some suggestions on how to improve the speed of this algorithm using
> trigrams, but I don't think I'll have time to work on it for a while.
There might be a better solution. The indexer implements storing of
words frequency. I did not activate it by default since it is not used
by the code. But if it's activated, a list of unique words is maintained
in the index. I use this a lot in a context other than htdig so I'm really
sure it works well. But it takes a bit more space, of course.
The 'substring' search could browse this list instead of the complete index
and that would give a list of candidates much more quickly.
To activate the unique word frequency storage just set
-- Loic Dachary
24 av Secretan 75019 Paris Tel: 33 1 42 45 09 16 e-mail: email@example.com URL: http://www.senga.org/
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Feb 10 2000 - 03:26:32 PST