Re: [htdig] re: Problems with iso characters


Subject: Re: [htdig] re: Problems with iso characters
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Oct 26 2000 - 11:01:06 PDT


Hmmm, this is a bit of a stumper. I can think of a few possibilities,
which are long-shots, but one might point to a solution:

1) Are you sure the /www/cgi-bin/htsearch binary is the same one as
the /home/httpd/cgi-bin/htsearch binary that the RPM installs? E.g. is
/www a symlink to /home/httpd, or vice-versa, or are they two separate
directories. If they're separate, where did /www/cgi-bin/htsearch come
from, and where is it looking for its config file?

2) Is there any possibility that you're using a different configuration
file for htsearch than for htdig/htmerge? If so, are you sure you have
identical settings for locale, and all the other language-specific
attributes? Straight out of the RPM, all binaries should use
/etc/htdig/htdig.conf by default, but htsearch may look elsewhere if given
a config parameter, a -c option, or a CONFIG_DIR environment variable.
The way you're using htsearch from the command line, the environment
variable would probably be the only possibility.

2) Maybe the locale handling changed subtly from Red Hat 6.0 to 6.2.
Try rebuilding the source RPM (rpm --rebuild htdig-3.1.5-0.src.rpm)
and installing that, in place of the htdig-3.1.5-0glibc21.i386.rpm which
was built on 6.0. I don't remember if I tested locale support on 6.2.
However, the db.wordlist looks like it's built correctly, so I'm not
convinced there's a problem with locale support.

3) Are you sure that the word "mäyrä" which you entered into htsearch
was using the ISO-8859-1 representation for "ä", and not some other
character set? It seems it must be, though, because the grep command
just afterward worked. I did say it was a long-shot.

According to Petri Lankoski:
> Peter Peltonen writes:
> > Petri Lankoski wrote:
> > > > I have bit problems with htdig and iso characters and I can't find
> > > > solution from FAQ to my problem. Htdig DB contains 8bit
> >
> > Here's how I got htdig working in Finnish (with ISO characters, that is):
> >
> > 1. Configured my htdig.conf:
> >
> > locale: fi_FI.ISO-8859-1
> > lang_dir: /var/lib/htdig/common/finnish
> > bad_word_list: ${lang_dir}/bad_words
> > endings_affix_file: ${lang_dir}/finnish.aff
> > endings_dictionary: ${lang_dir}/finnish.0
> > endings_root2word_db: ${lang_dir}/root2word.db
> > endings_word2root_db: ${lang_dir}/word2root.db
> >
> > 2. Hunted the web and finally found a finnish.dict file. I copied the file
> > as finnish.0 to the directory I specified in my htdig.conf (I also created
> > that directory :). Copied finnish.aff there too. (If you cannot find these
> > files, I can send them to you).
> >
> > 3. I made a list of bad words to the file bad_words
> >
> > I'm not sure if the machine running htdig has to be configured to be using
> > the fi-locale. I don't think so, but I changed that just to be sure.
>
> I tried with instructions above and still htsearch don't find
> accented characters. As far as I can see db contains 8-bit characters.
>
> [12:31] xcalibur /var/lib/htdig/db > /www/cgi-bin/htsearch
> Enter value for words: mäyrä
> Enter value for format: long
> Content-type: text/html
>
> <h1>No matches were found for 'mäyrä'</h1>
> <p>
> Check the spelling of the search word(s) you used.
> If the spelling is correct and you only used one word,
> try using one or more similar search words with "<b>Any</b>."
> </p>
>
> ...
>
> [12:32] xcalibur /var/lib/htdig/db > grep mäyrä db.wordlist
> mäyrä i:165 l:561 w:439 a:1
> mäyrä i:170 l:64 w:936
> mäyrä i:259 l:123 w:877
> mäyrä i:260 l:208 w:792
> mäyrä i:269 l:263 w:2146 c:7
> mäyrä i:270 l:237 w:3902 c:13
> mäyrä i:272 l:595 w:405
> mäyrä i:405 l:0 w:250895 c:3
> mäyrä i:406 l:862 w:138
> mäyrä i:418 l:26 w:974
> mäyrä i:84 l:742 w:258
> mäyrä i:85 l:117 w:883
> mäyrä i:90 l:626 w:374
> mäyräkoira i:421 l:697 w:303
> mäyrälle i:405 l:958 w:42
> mäyrältä i:170 l:203 w:797
> mäyrän i:269 l:247 w:1050 c:2
> mäyrän i:270 l:615 w:695 c:3
> mäyrän i:86 l:341 w:1507 c:5
> mäyrän i:89 l:667 w:333
> mäyrää i:405 l:944 w:56
>
> /etc/htdig/htdig.conf:
> locale: fi_FI
> lang_dir: /var/lib/htdig/common/finnish
> bad_word_list: ${lang_dir}/bad_words
> endings_affix_file: ${lang_dir}/finnish.aff
> endings_dictionary: ${lang_dir}/finnish.0
> endings_root2word_db: ${lang_dir}/root2word.db
> endings_word2root_db: ${lang_dir}/word2root.db
>
>
> [12:38] xcalibur /var/lib/htdig/db > locale
> LANG=fi_FI
> LC_CTYPE="fi_FI"
> LC_NUMERIC="fi_FI"
> LC_TIME="fi_FI"
> LC_COLLATE="fi_FI"
> LC_MONETARY="fi_FI"
> LC_MESSAGES="fi_FI"
> LC_ALL=fi_FI
>
>
> System is Redhat 6.2 and htdig is htdig-3.1.5-0glibc21

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Oct 26 2000 - 11:07:06 PDT