[htdig] problems with search_algorithm (was Re: problems with the "accent" patch)


Subject: [htdig] problems with search_algorithm (was Re: problems with the "accent" patch)
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Mar 06 2000 - 11:41:17 PST


According to Eric van der Vlist:
> I have applied this patch as well and noticed that it's working for most
> of the words, but not for others...

I've tracked down another bug which was preventing seach words found
by fuzzy matching from being highlighted. It was a rather obscure bug,
related to locale handling.

When you change your locale to one that uses a different format for
floating point numbers (i.e 0,5 instead of 0.5), then you must change
any floating point attribute definitions in your config file to use this
floating point format. This can affect any of the *_factor attributes, as
well as the search_algorithm attribute, on any system in which the atof()
function is locale-aware, as is the case on Linux systems where atof()
simply calls strtod(). Without this change, the floating point numbers
will be read as integers, so 0.5 will be treated as 0. If htsearch
thinks the weight is 0 for any fuzzy match algorithm, it won't highlight
the search words in the excerpt, even though, oddly enough, it did seem
to find those words. (I guess it would affect the ranking, though.)

The trouble is, when htsearch parses search_algorithm, it allows a comma
as a string list separator. This is undocumented, and it is unlike the
handling of other string list attributes. It also makes it impossible
to specify non-integer weights for fuzzy algorithms in locales that use a
comma as a decimal point. I hope no one was counting on this undocumented
feature in search_algorithm, because it's a bug that should be fixed in
future versions.

So, to get fuzzy highlighting working in locales that use comma instead of
period as decimal point, you must change your search_algorithm attribute
definition to use the comma in numbers, and you must apply this patch:

*** ../htdig-3.1.5/htsearch/htsearch.cc Thu Feb 24 20:29:11 2000
--- ../htdig-3.1.5.accents/htsearch/htsearch.cc Mon Mar 6 13:13:00 2000
*************** setupWords(char *allWords, List &searchW
*** 475,481 ****
      // configuration attribute.
      // For algorithms other than exact, we need to also do word lookups.
      //
! StringList algs(config["search_algorithm"], " \t,");
      List algorithms;
      String name, weight;
      double fweight;
--- 475,481 ----
      // configuration attribute.
      // For algorithms other than exact, we need to also do word lookups.
      //
! StringList algs(config["search_algorithm"], " \t");
      List algorithms;
      String name, weight;
      double fweight;

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Mar 06 2000 - 11:46:17 PST