Rene' Seindal (email@example.com)
Wed, 17 Sep 1997 21:32:58 +0200
> Date: Wed, 17 Sep 1997 13:27:23 +0200
> From: Till Kinstler <firstname.lastname@example.org>
> Sender: email@example.com
> Precedence: bulk
> Reply-To: Till Kinstler <firstname.lastname@example.org>
> I've got a problem with htfuzzy creating the endings-databases
> (words2root and root2words). I'm trying to biuld these databases
> from a german dictionary containig about 40000 words.
> The first 20500 words are processed quite fast (within a few minutes),
> but then htfuzzy slows down more and more. It is working now for
> 2 days...
> I'm using htdig 3.0.8b2. I've tried to build the databases on 2
> different machines: both running Linux (Kernel 2.0.30), one is a
> Pentium60 with only 16 MB, the other a 486 DX4/100 with 64 megs.
> On both machines there was the same problem, so it doesn't seem
> to depend on too small memory...
I have a htfuzzy on a danish dictionary (55.000 words quite a bit of
complicated rules), now running for more than a month (we are patient
here). It has currently used 1821 cpu-minutes. It doesn't grow
anymore, memorywise. It is on a 200MHz pentium, 64Mb ram and linux
2.0.29. In the beginning it took up all the cpu, now almost nothing,
but it is often seen hanging in a DiskWait state, indicating that the
problem might be on the IO part, maybe in linux maybe in gdbm.
# ps axvw | grep fuzzy
PID TTY STAT TIME PAGEIN TSIZ DSIZ RSS LIM %MEM COMMAND
783 1 D <1821:40 513318852 74 61129 55572 xx 87.8 /usr/local/htdig/bin/htfuzzy -c htdig.conf -v endings
As the process doesn't cause problems, we have let it run. It is htdig
3.0.8b1, even though we a now using 3.0.8b2 with some local changes to
avoid time out problems.
It must be because of the way endings for sing./plur., indefinite and
definite forms and genitive are attached to the stem accumulatively,
thereby creating a lot of derived words for each stem. In danish some
of the rules are very complicated, generating all different combinations
of the endings, and if I remember right, likewise for german. English,
on the other hand, are much simpler, having, e.g., the definite article
detached from the stem.
An example, in English and Danish.
Stem horse hest
Sing. Indef. horse hest
Sing. Def. the horse hesten
Plur. Indef. horses heste
Plur. Def. the horses hestene
Gen. Sing. Indef. horse's hests
Gen. Sing. Def. the horse's hestens
Gen. Plur. Indef. horses' hestes
Gen. Plur. Def. the horses' hestenes
The danish word generate eight derivatives, the english two or four,
depending on how you count the '. This is a very regular noun in
-- René Seindal (email@example.com) ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org containing the single word "unsubscribe" in the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:05 PST