Re: [htdig] problems with search_algorithm (was Re: problems with the "accent" patch)


Subject: Re: [htdig] problems with search_algorithm (was Re: problems with the "accent" patch)
From: Robert Marchand (robert.marchand@UMontreal.CA)
Date: Mon Mar 06 2000 - 13:53:31 PST


Hi,

Does it also correct the problem of not hilighting the second word (like
in "père mère") or was it another bug?

Someone here reported this problem to me and after applying your patch,
all seems well!

Thanks a lot.

At 13:41 00-03-06 -0600, Gilles Detillieux wrote:
>According to Eric van der Vlist:
>> I have applied this patch as well and noticed that it's working for most
>> of the words, but not for others...
>
>I've tracked down another bug which was preventing seach words found
>by fuzzy matching from being highlighted. It was a rather obscure bug,
>related to locale handling.
>
>When you change your locale to one that uses a different format for
>floating point numbers (i.e 0,5 instead of 0.5), then you must change
>any floating point attribute definitions in your config file to use this
>floating point format. This can affect any of the *_factor attributes, as
>well as the search_algorithm attribute, on any system in which the atof()
>function is locale-aware, as is the case on Linux systems where atof()
>simply calls strtod(). Without this change, the floating point numbers
>will be read as integers, so 0.5 will be treated as 0. If htsearch
>thinks the weight is 0 for any fuzzy match algorithm, it won't highlight
>the search words in the excerpt, even though, oddly enough, it did seem
>to find those words. (I guess it would affect the ranking, though.)
>
>The trouble is, when htsearch parses search_algorithm, it allows a comma
>as a string list separator. This is undocumented, and it is unlike the
>handling of other string list attributes. It also makes it impossible
>to specify non-integer weights for fuzzy algorithms in locales that use a
>comma as a decimal point. I hope no one was counting on this undocumented
>feature in search_algorithm, because it's a bug that should be fixed in
>future versions.
>
>So, to get fuzzy highlighting working in locales that use comma instead of
>period as decimal point, you must change your search_algorithm attribute
>definition to use the comma in numbers, and you must apply this patch:
>
>*** ../htdig-3.1.5/htsearch/htsearch.cc Thu Feb 24 20:29:11 2000
>--- ../htdig-3.1.5.accents/htsearch/htsearch.cc Mon Mar 6 13:13:00 2000
>*************** setupWords(char *allWords, List &searchW
>*** 475,481 ****
> // configuration attribute.
> // For algorithms other than exact, we need to also do word lookups.
> //
>! StringList algs(config["search_algorithm"], " \t,");
> List algorithms;
> String name, weight;
> double fweight;
>--- 475,481 ----
> // configuration attribute.
> // For algorithms other than exact, we need to also do word lookups.
> //
>! StringList algs(config["search_algorithm"], " \t");
> List algorithms;
> String name, weight;
> double fweight;
>
>--
>Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
>Spinal Cord Research Centre WWW:
http://www.scrc.umanitoba.ca/~grdetil
>Dept. Physiology, U. of Manitoba Phone: (204)789-3766
>Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
-------
Robert Marchand tél: 343-6111 poste 5210
DiTER-SDI e-mail: marchanr@diter.umontreal.ca
Université de Montréal Montréal, Canada

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Mar 06 2000 - 13:58:16 PST