[htdig] Problem with upper case umlauts in HTML documents


Subject: [htdig] Problem with upper case umlauts in HTML documents
From: Manfred Kunicke (kunicke@fz-rossendorf.de)
Date: Thu Dec 16 1999 - 03:27:45 PST


Hello,

I'm faced with the same problem as Jens Moellenhoff described in his
mail of
Tue Nov 30 1999 and found out it is a problem of upper case umlauts at
searching.

my ht://Dig 3.1.4 is running on AIX 4.3.2

For test purpose star_url points only one HTML-page:
 <HTML>
 &Uuml;berfall
 </HTML>

After digging and merging db.wordlist consists of
 überfall i:0 l:291 w:709

i.e. the umlaut is recognized right

Searching for Überfall
gives the result:
 No matches were found for 'Überfall'...

Searching for überfall
gives the result:
 Search results for 'überfall'
 ...
 [umlaut.html]
      (None of the search words were found in the top of this document.)
      http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes.

In the case of lower case umlaut, the digged HTML-page is
 <HTML>
 &uuml;berfall
 </HTML>
searching for überfall
gives the expected result:
...
[umlaut.html]
     überfall
     http://www.fz-rossendorf.de/FVTK/TEST/htdig/umlaut.html 12/16/99,
28 bytes
 

Kind regards
m.kunicke


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Dec 16 1999 - 03:41:25 PST