Re: htdig: htfuzzy slows down...


Rene' Seindal (seindal@webadm.kb.dk)
Wed, 17 Sep 1997 21:32:58 +0200


> Date: Wed, 17 Sep 1997 13:27:23 +0200
> From: Till Kinstler <till@gg15c1.phil.uni-sb.de>
> Sender: owner-htdig@sdsu.edu
> Precedence: bulk
> Reply-To: Till Kinstler <till@gg15c1.phil.uni-sb.de>
>
> Hello!
> I've got a problem with htfuzzy creating the endings-databases
> (words2root and root2words). I'm trying to biuld these databases
> from a german dictionary containig about 40000 words.
> The first 20500 words are processed quite fast (within a few minutes),
> but then htfuzzy slows down more and more. It is working now for
> 2 days...
> I'm using htdig 3.0.8b2. I've tried to build the databases on 2
> different machines: both running Linux (Kernel 2.0.30), one is a
> Pentium60 with only 16 MB, the other a 486 DX4/100 with 64 megs.
> On both machines there was the same problem, so it doesn't seem
> to depend on too small memory...

I have a htfuzzy on a danish dictionary (55.000 words quite a bit of
complicated rules), now running for more than a month (we are patient
here). It has currently used 1821 cpu-minutes. It doesn't grow
anymore, memorywise. It is on a 200MHz pentium, 64Mb ram and linux
2.0.29. In the beginning it took up all the cpu, now almost nothing,
but it is often seen hanging in a DiskWait state, indicating that the
problem might be on the IO part, maybe in linux maybe in gdbm.

# ps axvw | grep fuzzy
  PID TTY STAT TIME PAGEIN TSIZ DSIZ RSS LIM %MEM COMMAND
  783 1 D <1821:40 513318852 74 61129 55572 xx 87.8 /usr/local/htdig/bin/htfuzzy -c htdig.conf -v endings
 
As the process doesn't cause problems, we have let it run. It is htdig
3.0.8b1, even though we a now using 3.0.8b2 with some local changes to
avoid time out problems.

It must be because of the way endings for sing./plur., indefinite and
definite forms and genitive are attached to the stem accumulatively,
thereby creating a lot of derived words for each stem. In danish some
of the rules are very complicated, generating all different combinations
of the endings, and if I remember right, likewise for german. English,
on the other hand, are much simpler, having, e.g., the definite article
detached from the stem.

An example, in English and Danish.

Stem horse hest

Sing. Indef. horse hest
Sing. Def. the horse hesten

Plur. Indef. horses heste
Plur. Def. the horses hestene

Gen. Sing. Indef. horse's hests
Gen. Sing. Def. the horse's hestens

Gen. Plur. Indef. horses' hestes
Gen. Plur. Def. the horses' hestenes

The danish word generate eight derivatives, the english two or four,
depending on how you count the '. This is a very regular noun in
danish.

-- 
René Seindal (seindal@webadm.kb.dk)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:05 PST