Re: [htdig] Indexing german pages


Subject: Re: [htdig] Indexing german pages
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Jan 03 2001 - 10:11:06 PST


According to Radoy Pavlov:
> I have some questions regarding german language.
> Following the example in FAQ I've made my htdig.conf,
> extracted GermanWords.zip in $COMMON_DIR/german and edited htdig.conf.
> I've done this:
> rerun of rundig
> rerun of htfuzzy endings
> Still htdig cant find any words with umlauts ( etc), altho I have
> near
> 30 MB of databases.
> The search page shows that it is searching for the word .. with no
> effect.
>
> My search algorithm:
> search_algorithm: exact:1 endings:0.5 prefix:0.4
>
> Perhaps I need to optimize the algorithm in order to get some matches?
> What is a "correct" algorithm ?

No, the search algorithms are not likely the problem. If you can't even
get an exact match, the problem lies elsewhere, and in this case I'd bet
it's a problem with locales.

You didn't mention what system you are running htdig on, and what your
locale setting is. Some systems don't have properly functioning locale
support at all (e.g. many libc-5 based Linux systems), and many don't have
complete locale tables installed.

See the thread entitled "Portuguese" from this past May, for more pointers
on locale-related problems:

    http://www.htdig.org/mail/2000/05/index.html#61

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Jan 03 2001 - 10:22:55 PST