[htdig] Problem with umlauts in HTML documents

Subject: [htdig] Problem with umlauts in HTML documents
From: Jens Moellenhoff (Jens.Moellenhoff@partner.bmw.de)
Date: Tue Nov 30 1999 - 01:06:02 PST


Currently we're testing the usage of ht://Dig version 3.2. We have
managed to index several directories. We even managed to install the
German dictionary and grammar, so that it gives several alternative
search words.

But now when we search for a German word containing a German umlaut
(e.g. "Überfall"), it gives no match. We even tried to transcribe it as
"Ueberfall", but to no avail. A search for "Überfall" also showed
no result, because it splitted the search term at the ";".

However, when we searched for "berfall" or for 'U"berfall', it found the
document containing the word "Überfall", but it highlighted only the
string "berfall" in the result list.

The most interesting thing is that these difficulties only occured with
HTML and TXT files. PDF files do recognize all umlauts. We can index
these files, search for "Überfall", and the search result is displayed

We also tried to change the language declaration in the config file
according to the FAQs, using "locale: de_DE.ISO_8859-1", but that didn't
work either.

I am sorry if this has been described elsewhere before, but I'd be very
glad if you could point me to that resource then.

Kind regards,
Jens Moellenhoff

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 01:21:31 PST