Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 26 Jan 1999 11:47:01 -0600 (CST)

According to Alister van Tonder:
> My htmerge job often runs for several DAYS!!
> Even when I kill the job (after several days) it has produced a working
> searchable database!
> This particular job was started at 20h01 on Jan 22nd. The files below
> were created 10 minutes later. In the mean time htmerge continues as a
> job, usually taking all available CPU resources, and continues (until I
> eventually) have to kill it.
> A directory listing of the ~/htdig/lib/db directory is as follows:
> drwxr-xr-x 2 root root 11264 Jan 24 07:26 .
> drwxr-xr-x 4 root root 1024 Jan 1 10:19 ..
> -rw-r--r-- 1 root root 33153024 Jan 22 20:10 db.docdb
> -rw-rw-r-- 1 root root 740352 Jan 1 11:04 db.docs.index
> -rw-rw-r-- 1 root root 2430976 Jan 2 01:35 db.metaphone.db
> -rw-rw-r-- 1 root root 1686528 Jan 2 01:35 db.soundex.db
> -rw-r--r-- 1 root root 47838678 Jan 22 20:10 db.wordlist
> -rw-r--r-- 1 root root 12288 Jan 22 20:12 db.wordlist.new
> -rw-rw-r-- 1 root root 69552128 Jan 12 01:02 db.words.db
> -rw------- 1 root root 8388368 Jan 22 20:11 sort0795500092
> -rw------- 1 root root 8388371 Jan 22 20:11 sort0795500093
> -rw------- 1 root root 8388365 Jan 22 20:11 sort0795500094
> -rw------- 1 root root 8388309 Jan 22 20:11 sort0795500095
> -rw------- 1 root root 8388340 Jan 22 20:11 sort0795500096
> -rw------- 1 root root 5896925 Jan 22 20:12 sort0795500097

These files suggest to me that htmerge is either hanging altogether, or
crawling through the sorted db.wordlist generated by the sort program,
and read by htmerge/words.cc. The fact that the sort temporary files
add up to about the size of db.wordlist suggests the sort completed,
and the sort program was merging its final output into the pipe. But,
db.wordlist.new is tiny in comparison, so htmerge didn't get very far
in the sorted output.

Does db.wordlist.new slowly increase in size, or does it stay stuck at
the size shown there? The mod. time suggests a lack of writing for some
time, if you got the listing days after.

> This job has run 4500 minutes and is causing a heavy (unnessary) load on
> the system!
> 7954 root 16 0 640 640 472 R 0 99.0 2.0 4509m htmerg

Looks like an infinite loop to me, but the only way to really nail it down
would be to get a core dump of it. Could you kill htmerge with a signal
that gives you a core dump, and get a stack backtrace? If htmerge is
stripped, you may want to recompile it and try the non-stripped version.

> Is a configuration error causing this problem ?

Seems unlikely to me, but without knowing where htmerge is hanging, it's
hard to say.

> My rundig is virtually standard:

Your rundig script looks just like the one from the htdig-3.1.0b1
RPMs I built in September. Are you running on a Red Hat Linux system?
Which version and platform? Are you running a more recent htdig, but
with the old rundig script, or are you still running 3.1.0b1? If so,
you may want to try 3.1.0b4 and see if it still has the same problem.

