Subject: [htdig] Accent problem.
From: NEPOTE Charles (Neuilly Gestion) (firstname.lastname@example.org)
Date: Mon May 15 2000 - 03:51:53 PDT
I am searching to solve some problems in ht://Dig 3.1.5.
I tested and reproduce that :
-- more than one html file contains : both words "tué" and "tue" per file ;
-- or an html files contains the word "tue" and the html which is reffering
to it contains the word "tué" (or the reverse case)
[exemple : d0.htm containing "<a href="d1.htm">UN HOMME TUE</a>" and
d1.htm containing "tué"]
Then a search for "tué" or a search for "tue" will only find the last file
indexed which contains both "tué" and "tue".
In the file db.wordlist we can see for example :
tue i:0 [...]
tue i:1 [...]
tué i:1 [...]
tue i:2 [...]
tué i:2 [...]
(only the file which correspond to "i:2" will be found).
Is this can be solve ?
(Note I have in htdig.conf :
At the beginning of automatic typewritters (first moity of the century),
there was nos accented uppercases such as ÉÈ (the machines were
anglo-saxons) and so, the usage of accented lowercase desapear in common
usage : nowadays, many teachers in France teach that "there is never accent
in a lowercase". (In fact there is accented lowercase in all newpapers,
books printed by professionnals who know the rule that there must be
accented lowercase -- there is accented lowercase in France since the
beginning of prints).
This is a problem as accents have a sence :
"un homme tué" : means "a man killed"
"un homme tue" : means "a man kills".
How to understand : "UN HOMME TUE" if there is no accented lowercase ?
[and please do forget english mistakes...]
This archive was generated by hypermail 2b28 : Mon May 15 2000 - 01:41:13 PDT