[htdig] Problem with upper case umlauts in HTML documents

Subject: [htdig] Problem with upper case umlauts in HTML documents
From: Manfred Kunicke (kunicke@fz-rossendorf.de)
Date: Thu Dec 16 1999 - 03:27:45 PST


I'm faced with the same problem as Jens Moellenhoff described in his
mail of
Tue Nov 30 1999 and found out it is a problem of upper case umlauts at

my ht://Dig 3.1.4 is running on AIX 4.3.2

For test purpose star_url points only one HTML-page:

After digging and merging db.wordlist consists of
 überfall i:0 l:291 w:709

i.e. the umlaut is recognized right

Searching for Überfall
gives the result:
 No matches were found for 'Überfall'...

Searching for überfall
gives the result:
 Search results for 'überfall'
      (None of the search words were found in the top of this document.)
      http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes.

In the case of lower case umlaut, the digged HTML-page is
searching for überfall
gives the expected result:
     http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes

Kind regards

