[htdig] indexing dem cyrillic letters along w/ latin ones

Max Pyziur (pyz@panix.com)
Tue, 18 May 1999 22:52:54 -0400

Greetings All,

I'm still a newbie to ht://dig. I've installed it both on my home Linux
box (RPMs on RedHat 5.2) and on our server (running Solaris 2.6; yes, had
to find the necessary libstdc++ library and get a copy of gnu-make; the
address of the website is http://www.brama.com; our first uses of ht://dig
can be found at http://www.brama.com/search.html). I'm still in testing
mode and haven't begun to try and index the whole server, just one or two
directories. Our problem is that our website is trilingual - more than 50%
English, the rest mostly Ukrainian, with a bit here and there in Russian.
The other problem is that the Character set we're using for the Ukrainian
and Russian language pages is CP1251, not KOI8 (the Unix guy's and gal's
favorite). This is because CP1251 exists in one form whereas KOI8 exists
in several (KOI8-R, KOI8-U, KOI8-RU), all overlapping on a core set of
characters, but differing on about five or six, making use of any variant
of KOI8 just a bit unnerving.

I've seen the references to dictionaries available at
http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell-dictionaries.html and
have picked up the Russian one ( pretty much of a cinch to change Russian
koi8 to cp1251); however, does anyone know of Ukrainian dictionairies?

Last, do the compilations of ht://dig have to be done separately for each
language (clearly a newbie question).

Thanks in advance!

Max Pyziur

To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Tue May 18 1999 - 20:05:21 PDT