Subject: [htdig] Problem with upper case umlauts in HTML documents
From: Manfred Kunicke (kunicke@fz-rossendorf.de)
Date: Thu Dec 16 1999 - 03:27:45 PST
Hello,
I'm faced with the same problem as Jens Moellenhoff described in his
mail of
Tue Nov 30 1999 and found out it is a problem of upper case umlauts at
searching.
my ht://Dig 3.1.4 is running on AIX 4.3.2
For test purpose star_url points only one HTML-page:
<HTML>
Überfall
</HTML>
After digging and merging db.wordlist consists of
überfall i:0 l:291 w:709
i.e. the umlaut is recognized right
Searching for Überfall
gives the result:
No matches were found for 'Überfall'...
Searching for überfall
gives the result:
Search results for 'überfall'
...
[umlaut.html]
(None of the search words were found in the top of this document.)
http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes.
In the case of lower case umlaut, the digged HTML-page is
<HTML>
überfall
</HTML>
searching for überfall
gives the expected result:
...
[umlaut.html]
überfall
http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes
Kind regards
m.kunicke
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Dec 16 1999 - 03:41:25 PST