Re: [htdig] One possible solution for french accents support

Gilles Detillieux (
Fri, 29 Oct 1999 14:56:06 -0500 (CDT)

According to Gasmi Salim:
> I have wrote a quick patch for htdig 3.1.3 to add French accent support.
> You can see the result at :
> (link 'Search in my Site') or
> You can try, french words with or without accents, it works.
> I know a lot of people were searching for a possible solution.
> If you are insterested with this patch a howto is available at

Yikes! I have a hard time believing that your patch_accents program would
not start clobbering all sorts of data in db.docdb that it shouldn't.
I'm assuming the whole point of this is to strip out the accents from
the document excerpts, so that excerpt highlighting works for unaccented
search words.

If so, why not just strip out the accents on the fly in
htsearch/, before doing any searches on the excerpt, or
better yet, just poke in some entries in the translate table, set in
StringMatch::IgnoreCase() (in htlib/, to map accented
letters to equivalent lower-case unaccented letters? The letter mapping
in could also be done much more efficiently with a mapping

The best approach, though, would be to define a new "accent" fuzzy match
algorithm, which, when given a word, would search the word database
for all accented and unaccented equivalents. The main engine of this
would be very much like the current htfuzzy/ algorithm.
It would be more work, but you'd have something that would be selectable
by the search_algorithm config attribute, and would fit in well with
the existing code.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to containing the single word unsubscribe in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Oct 29 1999 - 13:05:44 PDT