SV: [htdig] Foreign chars (Swedish)


Subject: SV: [htdig] Foreign chars (Swedish)
From: Philippe Ramkvist-Henry (phira600@student.liu.se)
Date: Thu Nov 25 1999 - 12:38:44 PST


> Are the hits all capitalized, or do some of them have the lowercase ä?
> Does this problem happen consistently with certain accented letters, and
> not others? Do you have certain uppercase letters appearing in db.wordlist?

With hits you mean the actual words from the document I guess. Well only those
which are supposed to be capitalized are. For example: A search for "ättestupan"
renders 0 hits while a search for "Ättestupan" renders 18. The word is in the documents
always written as "Ättestupan" so this would be natural if the search was case sensitive.
The problem is that "Åsa" and "åsa" gives the exact same hits and it's also always
reffered to as "Åsa". The problem only exists (as far as I can test) for "äÄ".

The db.wordlist only contain lowercase letters.

> > I asked a guy here a the University and he said that there might be
> > complications with "unsigned char" and "char". He gave me the example
> > below. Please answer at a novice level, my C++ and Unix knowledge is very
> > limited.
>
> Good hunch, but given that some accented letters work and some give
> problems, I wouldn't expect that it's a problem with sign extension.
> This seems to point to a problem with the ctype tables for your locale,
> but there could be something else that I'm missing here. Please keep
> us posted.

I'm also looking for a synonym wordlist in swedish... If anyone has one, please
send me a copy.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You'll receive a message confirming the unsubscription.



This archive was generated by hypermail 2b25 : Thu Nov 25 1999 - 12:42:27 PST