Re: [htdig] Accent problem.


Subject: Re: [htdig] Accent problem.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon May 15 2000 - 12:49:54 PDT


According to "NEPOTE Charles (Neuilly Gestion)":
> I am searching to solve some problems in ht://Dig 3.1.5.
>
> I tested and reproduce that :
>
> If :
> -- more than one html file contains : both words "tué" and "tue" per file ;
> -- or an html files contains the word "tue" and the html which is reffering
> to it contains the word "tué" (or the reverse case)
> [exemple : d0.htm containing "<a href="d1.htm">UN HOMME TUE</a>" and
> d1.htm containing "tué"]
>
> Then a search for "tué" or a search for "tue" will only find the last file
> indexed which contains both "tué" and "tue".
>
> In the file db.wordlist we can see for example :
> tue i:0 [...]
> tue i:1 [...]
> tué i:1 [...]
> tue i:2 [...]
> tué i:2 [...]
>
> (only the file which correspond to "i:2" will be found).
>
> Is this can be solve ?
> (Note I have in htdig.conf :
> locale: fr_FR
> )

Yes, the locale seems to be working fine, as accented letters are taken as
part of the words in the word list. I assume the entries from db.wordlist
are as you find them after running htmerge. It's odd, but the sort seems
to treat accented and unaccented letters as equivalent, and I wonder if
that isn't throwing off htmerge's creation of the db.words.db database.
Otherwise, it seems all the "tué" entries should come after the "tue"
entries. Either that, or the latter database is corrupted, and so isn't
working right.

Does the problem persist even after you regenerate the database from
scratch? (htdig -i; htmerge) You may also want to try setting your
LOCALE environment variable to something other than fr_FR (e.g. en_US),
so that the sort will not do any accent folding, if indeed that is
the problem.

> <cultural parenthesis>
> At the beginning of automatic typewritters (first moity of the century),
> there was nos accented uppercases such as ÉÈ (the machines were
> anglo-saxons) and so, the usage of accented lowercase desapear in common
> usage : nowadays, many teachers in France teach that "there is never accent
> in a lowercase". (In fact there is accented lowercase in all newpapers,
> books printed by professionnals who know the rule that there must be
> accented lowercase -- there is accented lowercase in France since the
> beginning of prints).
> This is a problem as accents have a sence :
> "un homme tué" : means "a man killed"
> "un homme tue" : means "a man kills".
> How to understand : "UN HOMME TUE" if there is no accented lowercase ?
> </cultural parenthesis>.

I believe you mean uppercase where you say lowercase. Uppercase letters
are capitals (majuscules), while lowercase letters are small (minuscules).
Some French teachers in Canada also taught not to put accents on capitals,
but it didn't really catch on. I never realized that convention came about
just because of the difficulty of using accents on typewriters.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon May 15 2000 - 10:37:54 PDT