Re: [htdig] indexing dem cyrillic letters along w/ latin ones


Subject: Re: [htdig] indexing dem cyrillic letters along w/ latin ones
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Dec 11 2000 - 10:11:32 PST


According to Max Pyziur:
> Sometime around the end of 1999 there was a Ukrainian dictionary which appeared
> on a server in Ukraine. It is in the KOI8 encoding. You can find it here:
> ftp://cad.ntu-kpi.kiev.ua/soft/lingvist/UkrIspell/
> or here:
> http://www.physics.mcgill.ca/WWW/oleh/emacs/ispell.html
>
>
> I downloaded it, wrote a perl script for converting it to cp1251 (available on
> my website) and converted the dictionary to cp1251.
>
> I'll also make both things available at brama.com for those who might be
> interested.
>
> I also setup a Ukrainian language locale on my RH6.2 server using the following
> command:
> localedef -c -f CP1251 -i uk_UA -u mnemonic.ds /usr/share/locale/uk_UA.cp1251
>
> I then put the following lines in my conf files
> locale: uk_UA.cp1251
> lang_dir: ${common_dir}/ukrainian
> bad_words_list: ${lang_dir}/ukr_badwords
> endings_affix_file: ${lang_dir}/ukrainian.aff
>
> The funny thing (head scratching) is that I'm not totally convinced that the
> dictionary is necessary. I mean there are about 40,000 words in the dictionary,
> but I can use case insensitive search terms for words which don't occur there.
> I guess this is still one of the things which I don't fully understand about the
> configuration of htdig.
>
> Anyway, I'm very pleased with the results so far.

The dictionary and affix file are not needed for exact matches.
Their sole purpose is to implement the "endings" fuzzy match algorithm.
For example, in English, this algorithm expands a search for "blast" to
"(blast or blasted or blasting or blaster or blasts or blasters)". You
need to run "htfuzzy endings" to build the word2root.db and root2word.db
databases from the dictionary and affix file, and you enable the endings
fuzzy match algorithm using the search_algorithm config attribute.

See http://www.htdig.org/attrs.html#search_algorithm
and http://www.htdig.org/htfuzzy.html

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Mon Dec 11 2000 - 10:21:29 PST