Re: [htdig3-dev] Search for other language.


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 30 Apr 1999 10:00:50 -0500 (CDT)


According to Geoff Hutchison:
> On Fri, 30 Apr 1999, Supawat Pusavanno wrote:
>
> > i am going to develop the search engine for own language (Thai Language).
>
> That would be fantastic!
>
> > Is it support of other language?
>
> At the moment, it's limited mostly to Roman alphabets, or more precisely
> single-byte.
>
> > If the HTDig designed to support other language..then
> > Where should i start in HTDig?
>
> Actually it's on the projects list I just updated. :-)
> The place to start would be to become familiar with UTF-8 and Unicode. It
> seems like this would be the best way to support all languages. Once we
> have Unicode support in the database, it shouldn't be too hard to localize
> for specific lanuguages. (I hope.)
>
> Unicode information can be found at <http://www.unicode.org/>
> A project providing UTF-8 translation <http://www.whizkidtech.net/i18n/>
>
> I think the *first* place to start in ht://Dig would be changing the
> String.cc class to accept UTF-8 strings.
>
> Others should feel free to correct me. I'm not very familiar with il8n.

Neither am I. The only thing I can think to add is that if you're
fortunate enough to have a language that's supported by one of the
"locales" on your system, and it uses an 8-bit character set, then the
job of supporting your language is a lot easier. You need to set the
locale correctly in your htdig.conf, obtain or build ispell dictionary
and affix files for your language, and configure htdig to use these. See

        http://www.htdig.org/FAQ.html#q4.10

for a bit more information on this. I really don't know about Thai.
My understanding is it uses an alphabet that's more phonetic than
pictographic, unlike Chinese, Japanese and Korean, so it may be that it
fits in an 8-bit character set. Whether it's supported by a standard
locale on your system is another matter, though. With the right locale
support, the search engine will work even without the dictionaries for
your language, but you need the dictionary and affix files to support
the endings algorithm. Without endings, only exact matches of search
words will work.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Apr 30 1999 - 08:09:51 PDT