Re: htdig: Foreign dictionaries and word stemming ?


Alexander Bergolth (leo@strike.wu-wien.ac.at)
Wed, 20 Jan 1999 14:24:46 +0100 (MEZ)


On Wed, 20 Jan 1999, Stephan Gilbert wrote:

> Has anyone in the community experimented with foreign language
> dictionaries ? I did download the german "ispell" compatible
> dictionary and affix file for htfuzzy. I did run htfuzzy on it
> but stopped it after 3 days (200Mhz Pentium running 2.0.34 Linux).
> The data base file did gow from approx. 2 MBytes to 6. How can
> you test to see whether it actually accomplishes something sensible?

I think that the DB formats are not binary compatible between different
architectures, but maybe a dumped version of the german root2word and
word2root DBs will help you.

I have dumped my databases (it took several weeks of rendering for those)
using db_dump from the Berkeley DB distribution, they are avaliable at

http://leo.wu-wien.ac.at/htdig/

(or http://strike.wu-wien.ac.at/~leo/htdig/)

the dumped files are gzipped, you should be able to create the databases
with
gzip -cd root2word.dump.gz | db_load root2word.db
gzip -cd word2root.dump.gz | db_load word2root.db

I tried that and the resulting database was smaller than the original.
I dumped the result again and the two dumps are identical.

Could someone try if the DBs are OK?

Cheers,
         Leo

-----------------------------------------------------------------------
Alexander (Leo) Bergolth leo@leo.wu-wien.ac.at
WU-Wien - Zentrum fuer Informatikdienste http://leo.wu-wien.ac.at
Info Center
In a world without walls and fences, who needs windows and gates?

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 20 1999 - 08:37:47 PST