[htdig]PB accent with french words

André LAGADEC (andre.lagadec@proto.education.gouv.fr)
Thu, 11 Mar 1999 21:18:11 +0100


I get and install Htdig on a Web server with french document on Compaq
Proliant 200 computer, with Linux Red Hat 5.0, kernel 2.0.33 and Apache

It work but I have a problem with accent. I can retrieve some word like
"académie" in html pages but not in all pages where there is the word
"académie". And if I search "acad", I can see the pages where there is
the word "académies" because in the db.wordlist file this word is

I suppose that when Htdig see "académie", he detect 2 word "acad" and
"mie", because character 'é' or é but he detect also One word
"académie" in other page !?

I see in the mailing list, that other people have the same problem. I
change my htdig file configuration (see follow) and add some directives
preconised by different people, like
locale fr, or locale fr_FR.ISO_8859-1, valid_punctuation

But he doesn'y=t work correctly.

bad_word_list: ${common_dir}/mots_exclus
locale: fr_FR.ISO_8859-1
iso_8601: true
valid_punctuation: "()!?,
search_algorithm: exact:1 synonyms:0.5 endings:0.1
# Affix rules file
endings_affix_file ${common_dir}/francais.aff
# Dictionary file
endings_dictionary ${common_dir}/francais.0

An idea ?


