[htdig] Problem with umlauts in HTML documents


Subject: [htdig] Problem with umlauts in HTML documents
From: Jens Moellenhoff (Jens.Moellenhoff@partner.bmw.de)
Date: Tue Nov 30 1999 - 01:06:02 PST


Hello,

Currently we're testing the usage of ht://Dig version 3.2. We have
managed to index several directories. We even managed to install the
German dictionary and grammar, so that it gives several alternative
search words.

But now when we search for a German word containing a German umlaut
(e.g. "Überfall"), it gives no match. We even tried to transcribe it as
"Ueberfall", but to no avail. A search for "Überfall" also showed
no result, because it splitted the search term at the ";".

However, when we searched for "berfall" or for 'U"berfall', it found the
document containing the word "Überfall", but it highlighted only the
string "berfall" in the result list.

The most interesting thing is that these difficulties only occured with
HTML and TXT files. PDF files do recognize all umlauts. We can index
these files, search for "Überfall", and the search result is displayed
correctly.

We also tried to change the language declaration in the config file
according to the FAQs, using "locale: de_DE.ISO_8859-1", but that didn't
work either.

I am sorry if this has been described elsewhere before, but I'd be very
glad if you could point me to that resource then.

Kind regards,
Jens Moellenhoff

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 01:21:31 PST