Re: htdig: htfuzzy - endings runs VERY long


Alexander Bergolth (leo@strike.wu-wien.ac.at)
Thu, 26 Nov 1998 16:47:40 +0100 (MEZ)


Hallo Frank!

On Thu, 26 Nov 1998, Frank Richter wrote:

> I'm building a database for "ending" search algorithm with a German
> dictionary and rule set. The dictionary has 40794 lines.

Hehe! Have much fun!

> I started running (from 3.1.0b2)
> % htfuzzy -v -c htfuzzy-de.conf endings
> yesterday. The first 20000 lines it did in a few minutes, but after ca.
> 14 hours it is here:
> htfuzzy/endings: words: 27900
>
> I saw the same with 3.0.8b2, using a smaller dictionary (25000 lines), so
> this is probably not a new problem.

I didn't debug htfuzzy but I ran htfuzzy with a 76087 words input-file and
it took about 3 weeks or so on a brand new RS/6000 dual processor machine
to build the dictionary. (The first 50000 words took a few minutes, then
it slowed down dramatically.)

I don't know if the db-files are binary compatible but you can have my
-rw-rw-r-- 1 bergolth edvz 7310336 Aug 21 07:43 root2word.db
-rw-rw-r-- 1 bergolth edvz 13724672 Aug 21 07:43 word2root.db
files and try it with them...

http://strike.wu-wien.ac.at/~leo/htdig/root2word.db
http://strike.wu-wien.ac.at/~leo/htdig/word2root.db

Bye,
      Leo

-----------------------------------------------------------------------
Alexander (Leo) Bergolth leo@leo.wu-wien.ac.at
WU-Wien - Zentrum fuer Informatikdienste http://leo.wu-wien.ac.at
Info Center
In a world without walls and fences, who needs windows and gates?

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:54 PST