Re: [htdig3-dev] Bug#56721: htdig and locale de_DE peculiarities. (fwd)


Subject: Re: [htdig3-dev] Bug#56721: htdig and locale de_DE peculiarities. (fwd)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Feb 03 2000 - 18:21:05 PST


At 5:21 PM +0100 1/31/00, Gergely Madarasz wrote:
>I use htdig with a locale: de_DE setting. It seems unable to find
>occurrences of words containing non-ascii characters that are part of
>titles, <Hn> or emphasis elements. Say, if i look for "bg" in my
>data, it finds an index.html document that contains the line

This is rather odd. You see, the HTML parser doesn't pay much
attention to emphasis tags like <strong> or <em> and doesn't really
do anything different about <Hn> tags as far as recording words.

However, Marc Pohl <Marc.Pohl@wdr.de> found a problem with handling
of 8-bit characters. I don't know whether it would fix this problem,
but the patch is attached.

Please let me know if this helps,


-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Feb 03 2000 - 18:23:26 PST