Nat Irons (irons@apple.com)
Fri, 11 Jun 1999 18:02:39 -0700

The portions of my site I desire to index are mailing list archives; about 128
months worth, growing at a rate of ten per month. I've written a few scripts
to handle digging and merging the site in month-size blocks.

Digging goes fine. I can dig everything, and dump each successive month's
worth of dug results into a unique folder in a staging area. Then the merge
scripts kick in, attempting to serially merge the contents of every directory
in the staging area with the master index. This works for a while, and then
it claims to run out of disk space. I get these results from htmerge:

/: write failed, file system is full
/usr/bin/sort: write error: No space left on device
htmerge: Word sort failed

Once I've seen one of these errors, subsequent merge attempts will fail as
well, until I delete them and start over -- but as near as I can tell, disk
space is not the problem. I can create files cheerfully on the volume in
question; df shows it to be at 59% capacity with a few hundred MB free.

My ISP, which hosts the freebsd server I'm running on, is stonewalling me, but
the only answer I was reasonably hoping to get from them is, "yes, there's a
problem with that disk" and if it were that simple I think I'd have gotten an
answer. Their only other advice has been to use multiple small databases,
which I suppose means they're thinking of some other search engine.

Do phantom out-of-disk-space errors sound familiar to anyone? Is there a
characteristic of htmerge which might cause it to think it was running out of
disk space (such as requiring a fantastic amount of temporary disk space)?
I've had the problem kick in at index sizes from 40MB to about 90MB; I expect
the full index when complete will be about 200-250MB. I'm using htdig 3.1.2.



