[htdig] Accent problem.

Subject: [htdig] Accent problem.
From: NEPOTE Charles (Neuilly Gestion) (charles.nepote@cetelem.fr)
Date: Mon May 15 2000 - 03:51:53 PDT


I am searching to solve some problems in ht://Dig 3.1.5.

I tested and reproduce that :

If :
 -- more than one html file contains : both words "tué" and "tue" per file ;
 -- or an html files contains the word "tue" and the html which is reffering
to it contains the word "tué" (or the reverse case)
    [exemple : d0.htm containing "<a href="d1.htm">UN HOMME TUE</a>" and
d1.htm containing "tué"]

Then a search for "tué" or a search for "tue" will only find the last file
indexed which contains both "tué" and "tue".

In the file db.wordlist we can see for example :
tue i:0 [...]
tue i:1 [...]
tué i:1 [...]
tue i:2 [...]
tué i:2 [...]

(only the file which correspond to "i:2" will be found).

Is this can be solve ?
(Note I have in htdig.conf :
locale: fr_FR

<cultural parenthesis>
At the beginning of automatic typewritters (first moity of the century),
there was nos accented uppercases such as ÉÈ (the machines were
anglo-saxons) and so, the usage of accented lowercase desapear in common
usage : nowadays, many teachers in France teach that "there is never accent
in a lowercase". (In fact there is accented lowercase in all newpapers,
books printed by professionnals who know the rule that there must be
accented lowercase -- there is accented lowercase in France since the
beginning of prints).
This is a problem as accents have a sence :
"un homme tué" : means "a man killed"
"un homme tue" : means "a man kills".
How to understand : "UN HOMME TUE" if there is no accented lowercase ?
</cultural parenthesis>.

Charles Népote
paris, France
[and please do forget english mistakes...]

This archive was generated by hypermail 2b28 : Mon May 15 2000 - 01:41:13 PDT