Re: [htdig] re: Problems with iso characters


Subject: Re: [htdig] re: Problems with iso characters
From: Petri Lankoski (kreivi@iki.fi)
Date: Tue Oct 24 2000 - 02:52:13 PDT


Peter Peltonen writes:
> Petri Lankoski wrote:
>
> > > I have bit problems with htdig and iso characters and I can't find
> > > solution from FAQ to my problem. Htdig DB contains 8bit
>
> Here's how I got htdig working in Finnish (with ISO characters, that is):
>
> 1. Configured my htdig.conf:
>
> locale: fi_FI.ISO-8859-1
> lang_dir: /var/lib/htdig/common/finnish
> bad_word_list: ${lang_dir}/bad_words
> endings_affix_file: ${lang_dir}/finnish.aff
> endings_dictionary: ${lang_dir}/finnish.0
> endings_root2word_db: ${lang_dir}/root2word.db
> endings_word2root_db: ${lang_dir}/word2root.db
>
> 2. Hunted the web and finally found a finnish.dict file. I copied the file
> as finnish.0 to the directory I specified in my htdig.conf (I also created
> that directory :). Copied finnish.aff there too. (If you cannot find these
> files, I can send them to you).
>
> 3. I made a list of bad words to the file bad_words
>
> I'm not sure if the machine running htdig has to be configured to be using
> the fi-locale. I don't think so, but I changed that just to be sure.

I tried with instructions above and still htsearch don't find
accented characters. As far as I can see db contains 8-bit characters.

[12:31] xcalibur /var/lib/htdig/db > /www/cgi-bin/htsearch
Enter value for words: mäyrä
Enter value for format: long
Content-type: text/html

<h1>No matches were found for 'mäyrä'</h1>
<p>
Check the spelling of the search word(s) you used.
If the spelling is correct and you only used one word,
try using one or more similar search words with "<b>Any</b>."
</p>

...

[12:32] xcalibur /var/lib/htdig/db > grep mäyrä db.wordlist
mäyrä i:165 l:561 w:439 a:1
mäyrä i:170 l:64 w:936
mäyrä i:259 l:123 w:877
mäyrä i:260 l:208 w:792
mäyrä i:269 l:263 w:2146 c:7
mäyrä i:270 l:237 w:3902 c:13
mäyrä i:272 l:595 w:405
mäyrä i:405 l:0 w:250895 c:3
mäyrä i:406 l:862 w:138
mäyrä i:418 l:26 w:974
mäyrä i:84 l:742 w:258
mäyrä i:85 l:117 w:883
mäyrä i:90 l:626 w:374
mäyräkoira i:421 l:697 w:303
mäyrälle i:405 l:958 w:42
mäyrältä i:170 l:203 w:797
mäyrän i:269 l:247 w:1050 c:2
mäyrän i:270 l:615 w:695 c:3
mäyrän i:86 l:341 w:1507 c:5
mäyrän i:89 l:667 w:333
mäyrää i:405 l:944 w:56

/etc/htdig/htdig.conf:
locale: fi_FI
lang_dir: /var/lib/htdig/common/finnish
bad_word_list: ${lang_dir}/bad_words
endings_affix_file: ${lang_dir}/finnish.aff
endings_dictionary: ${lang_dir}/finnish.0
endings_root2word_db: ${lang_dir}/root2word.db
endings_word2root_db: ${lang_dir}/word2root.db

[12:38] xcalibur /var/lib/htdig/db > locale
LANG=fi_FI
LC_CTYPE="fi_FI"
LC_NUMERIC="fi_FI"
LC_TIME="fi_FI"
LC_COLLATE="fi_FI"
LC_MONETARY="fi_FI"
LC_MESSAGES="fi_FI"
LC_ALL=fi_FI

System is Redhat 6.2 and htdig is htdig-3.1.5-0glibc21

-- 
  Petri Lankoski		Yeah you wanna go out 'cause it's raining 
  kreivi@iki.fi			and blowing * You can't go out cause your 
  http://www.iki.fi/~kreivi/	roots are showing * dye em black
  PGP: http://www.iki.fi/~kreivi/pgp.txt             type o negative

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Oct 24 2000 - 02:57:52 PDT