Re: [htdig] Problem with umlauts in HTML documents


Subject: Re: [htdig] Problem with umlauts in HTML documents
From: Torsten Neuer (tneuer@inwise.de)
Date: Tue Nov 30 1999 - 01:26:52 PST


Jens Moellenhoff wrote:
>
> Hello,
>
> Currently we're testing the usage of ht://Dig version 3.2. We have
> managed to index several directories. We even managed to install the
> German dictionary and grammar, so that it gives several alternative
> search words.
>
> But now when we search for a German word containing a German umlaut
> (e.g. "Überfall"), it gives no match. We even tried to transcribe it as
> "Ueberfall", but to no avail. A search for "Überfall" also showed
> no result, because it splitted the search term at the ";".
>
> However, when we searched for "berfall" or for 'U"berfall', it found the
> document containing the word "Überfall", but it highlighted only the
> string "berfall" in the result list.
>
> The most interesting thing is that these difficulties only occured with
> HTML and TXT files. PDF files do recognize all umlauts. We can index
> these files, search for "Überfall", and the search result is displayed
> correctly.
>
> We also tried to change the language declaration in the config file
> according to the FAQs, using "locale: de_DE.ISO_8859-1", but that didn't
> work either.

The FAQ contains an *example* of using the locale directive. The actual
setting of this directive depends upon the locale database installed on
your system (usually in "/usr/share/locale" or "/usr/lib/locale").

See locale(5) for more information.

For a German system, de_DE.ISO_8859-1 *may* work (if this is the name of
the installed German locale).
 
> I am sorry if this has been described elsewhere before, but I'd be very
> glad if you could point me to that resource then.

In fact, the FAQ "4.10. How do I index documents in other languages?"
does not contain the line "locale: de_DE.ISO8859-1", but instead says
"locale: de_DE".

hth,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 01:39:47 PST