Re: [htdig] making other languages work


Subject: Re: [htdig] making other languages work
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Jun 07 2000 - 10:40:05 PDT


According to Peter Peltonen:
> - Edited htdig.conf:
>
> --snip--
> locale: fi_FI.ISO-8859-1
> lang_dir: /var/lib/htdig/common/finnish
> bad_word_list: ${lang_dir}/bad_words
> endings_affix_file: ${lang_dir}/finnish.aff
> endings_dictionary: ${lang_dir}/finnish.0
> endings_root2word_db: ${lang_dir}/root2word.db
> endings_word2root_db: ${lang_dir}/word2root.db
> --snip--
>
> - Created directory /var/lib/htdig/common/finnish and copied finnish.aff and
> finnish.0 there.
>
> - What next? Where do I get the root2word.db, word2root.db and bad_words
> files? Should I just use the ones in /var/lib/htdig/common?

The root2word.db and word2root.db databases are built from your finnish.0
and finnish.aff files, when you run "htfuzzy endings". The rundig
script normally does this automatically if it notices the databases are
missing or outdated, but it checks against the $COMMONDIR/english.0 file,
so you may want to change it to look at $COMMONDIR/finnish.0 instead,
and set COMMONDIR to /var/lib/htdig/common/finnish in the script, to
coincide with your lang_dir setting.

The bad_words file and synonyms file in /var/lib/htdig/common both
contain English words only, so you'd need to create your own Finnish
versions of these, if you feel you need them. The bad_words file is
just for common words that you don't want indexed. You may decide you
don't need it, or you can pick out common words from your db.wordlist
that you'd rather not have indexed, and add them to your bad_words
file. You'll need to reindex for this to take effect.

The synonyms file is for the synonyms fuzzy algorithm, and the distributed
version really just contains alternate spellings and common misspellings
of English words. If you decide you don't need one of these for Finnish,
or don't want to put in the work to create it, you can remove the synonyms
algorithm from your search_algorithm attribute in your config file(s)
for htsearch. If you do create or acquire one, you can put it in your
finnish subdirectory, and set

synonym_dictionary: ${lang_dir}/synonyms
synonym_db: ${lang_dir}/synonyms.db

and create the synonyms.db using "htfuzzy synonyms". Changing COMMONDIR
in rundig as above will also make rundig recreate this database when
needed.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Jun 07 2000 - 08:29:57 PDT