Re[2]: [htdig] A language issue.. Could you give me a favor?


Subject: Re[2]: [htdig] A language issue.. Could you give me a favor?
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Mar 22 2000 - 18:59:06 PST


At 9:38 AM +0900 3/23/00, Oskar Bartenstein wrote:
>Boils down to 2 questions (sorry I never looked at the source code):
> - is htdig 8-bit clean?
> - is htdig words and dictionaries sequences of bytes?
>If both is yes, then I would guess the core is ok,
>and we only have to look at how to use it properly.
>Hope I did not overlook a parsing issue.

It is 8-bit clean, but it treats characters as synonymous with 8
bits. Many parts of the code (the String class in particular) assume
that a character is only 1 byte and keeps going. In many encodings,
this is *not* the case, and so you're stuck.

>A correct HTML page includes info about its encoding, therefore
>htdig on the receiving end can convert it to any code it likes.

Yes, provided that it has code to convert from one encoding into
another. :-) This is the crux of the problem. Currently ht://Dig
assumes the host system has working locale support and is getting the
pages in the default encoding of the system. If they're not, it
assumes they are anyway. :-) It makes no attempt to convert character
encodings.

Basically, if you have an Latin-1 encoding for your character-set,
you're OK. That's the limit of the current i18n.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Mar 22 2000 - 17:59:51 PST