Re: htdig: iso characters?


Iosif Fettich (ifettich@netsoft.ro)
Tue, 24 Nov 1998 16:34:17 +0200 (EET)


> Will it it work if we don't use html entities for non-ascii, but use
> real charsacters instead (i.e. used instead och ä)?
>
> If this is not implemeted, please could someone point me to where in the
> code such a conversion would fit? I might just write it myself ;)

Non-ASCII is a headache for itself, as you sure know already... I can send
you a quick-and-ugly patch that should be on his way to a better shape in
the not-so-far future. Till then, here is what I have working at our
sites:

- HTML-texts are in Romanian (change it as you wish) so we have lots of
non_ASCII chars in it (most of them as real chars and an appropiate
"charset=iso-8859-2" META-option

- before indexing, non_ASCII chars are mapped to real ASCII text (both
from real chars as from html entities)

- the same mapping occurs in search phrases

Only inconvenient so far: users get also pages they actually don't wish
(all chars with accent 'suddenly' forget about that wghen searched.

Let me know if interested.

Iosif Fettich

-----------------------------------------------------------------------
Iosif Fettich | e-mail: ifettich@netsoft.ro ICQ UIN: 5496730
Mng. Director | phone/fax: +40-(0)65-162614
NetSoft SRL | mail: NetSoft SRL,4300 Tg.Mures,O.P.1-C.P.182,Romania

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:51 PST