Re: [htdig] Two languages and accentuated words


Subject: Re: [htdig] Two languages and accentuated words
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Sep 20 2000 - 09:58:49 PDT


According to Manuel Monteiro:
> I don't have db.wordlist file. For Portuguese i have only this files:
> portugues.aff
> portugues.0
> root2word.db
> word2root.db
> The .db files were created using htfuzzy endings. Am i missing something?

The files you mention are normally in your "common" directory.
The db.wordlist file should be in your "db" directory, as defined by
the database_dir attribute.

> pt_PT was also my first guess. I've also tried pt-PT.ISO8859-1 with the same
> result.
> When i issue locale -a command in my system i get:
>
> POSIX
> en_US.ISO8859-1
>
> Could this be the problem? Doesn't have pt_PT...

Yes, this would seem to be the problem. Try using en_US or
en_US.ISO8859-1 as your locale setting in your config file and see if
that solves the problem. The most important locale item for htdig is
the LC_CTYPE tables, which are usually the same for all ISO-8859-1
character set based locales, so this one may work for you, but "C"
or "POSIX" won't, and using an undefined locale won't work either.
You may also want to look into installing other locales on your system
if you need them for any other purpose.

> > > bad_word_list: ${lang_dir}/bad_words
> > > endings_affix_file: ${lang_dir}/portugues.aff
> > > endings_dictionary: ${lang_dir}/portugues.0
> > > endings_root2word_db: ${lang_dir}/root2word.db
> > > endings_word2root_db: ${lang_dir}/word2root.db
> > Hmm. My first guess was that the pt_PT locale isn't working on your
> system,
> > but the fact that your query is expanded to '(seminário or seminários)'
> > suggests that the endings algorithm is working correctly, and apparently
> > it's handling accented letters as letters correctly as well. Does the
> > word seminário appear in your db.wordlist file?

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Sep 20 2000 - 10:01:31 PDT