Subject: Re: [htdig] indexing dem cyrillic letters along w/ latin ones
From: Max Pyziur (email@example.com)
Date: Sat Dec 09 2000 - 09:30:09 PST
over a year and a 1/2 ago the dialog went thusly:
> According to Max Pyziur:
> >Greetings All,
> >I'm still a newbie to ht://dig. I've installed it both on my home Linux
> >box (RPMs on RedHat 5.2) and on our server (running Solaris 2.6; yes, had
> >to find the necessary libstdc++ library and get a copy of gnu-make; the
> >address of the website is http://www.brama.com; our first uses of ht://dig
> >can be found at http://www.brama.com/search.html). I'm still in testing
> >mode and haven't begun to try and index the whole server, just one or two
> >directories. Our problem is that our website is trilingual - more than 50%
> >English, the rest mostly Ukrainian, with a bit here and there in Russian.
> >The other problem is that the Character set we're using for the Ukrainian
> >and Russian language pages is CP1251, not KOI8 (the Unix guy's and gal's
> >favorite). This is because CP1251 exists in one form whereas KOI8 exists
> >in several (KOI8-R, KOI8-U, KOI8-RU), all overlapping on a core set of
> >characters, but differing on about five or six, making use of any variant
> >of KOI8 just a bit unnerving.
> >I've seen the references to dictionaries available at
> >http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell-dictionaries.html and
> >have picked up the Russian one ( pretty much of a cinch to change Russian
> >koi8 to cp1251); however, does anyone know of Ukrainian dictionairies?
> Not yet - Ukrainian is not very widely used in other countries than Ukrania
> itself, I think. Maybe you can get some information at the computer sience
> or math divisions of Ukranian universities? At least this would be where I
> tried to look for this since there is a good chance that the people there
> could use iSpell for checking TeX documents.
> >Last, do the compilations of ht://dig have to be done separately for each
> >language (clearly a newbie question).
> No. Setting the "locale" directive in the configuration file should be
Sometime around the end of 1999 there was a Ukrainian dictionary which appeared
on a server in Ukraine. It is in the KOI8 encoding. You can find it here:
I downloaded it, wrote a perl script for converting it to cp1251 (available on
my website) and converted the dictionary to cp1251.
I'll also make both things available at brama.com for those who might be
I also setup a Ukrainian language locale on my RH6.2 server using the following
localedef -c -f CP1251 -i uk_UA -u mnemonic.ds /usr/share/locale/uk_UA.cp1251
I then put the following lines in my conf files
The funny thing (head scratching) is that I'm not totally convinced that the
dictionary is necessary. I mean there are about 40,000 words in the dictionary,
but I can use case insensitive search terms for words which don't occur there.
I guess this is still one of the things which I don't fully understand about the
configuration of htdig.
Anyway, I'm very pleased with the results so far.
> InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
> Waldhofstraße 14 Tel: +49-4101-403605
> D-25474 Ellerbek Fax: +49-4101-403606
> E-Mail: firstname.lastname@example.org Internet: http://www.inwise.de
-- Max Pyziur BRAMA - Gateway Ukraine email@example.com http://www.brama.com/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Sat Dec 09 2000 - 15:47:36 PST