Re: [htdig] htmerge transient files?

Subject: Re: [htdig] htmerge transient files?
From: Malcolm Austen (
Date: Thu Jan 13 2000 - 02:38:22 PST

On Tue, 11 Jan 2000, Geoff Hutchison wrote:

+ At 9:43 AM +0000 1/10/00, Malcolm Austen wrote:
+ > - on exit from htmerge there appears to be about 1.5Gb free on the device
+ > - certainly a moment later htfuzzy found space to write >50Mb to it
+ > - I'm not using the same device for temp files - I already had htmerge
+ > fail because sort could not manage with the 700Mb available in
+ > /tmp and so have set:
+ >
+ > setenv TMPDIR /var/spider
+ > # where there is 5Gb available
+ >
+ >Has anyone any thoughts as to what might have transiently soaked up the
+ >space? Could sort have left some space allocated?
+ When you say "on exit," this means that the sort program will have
+ cleaned up all of its temporary files. Out of curiosity, what does
+ your sort manpage say about "sort -T" (if this is on a BSD-like
+ system, it will probably not specify the temp directory).

Sorry Geoff, should have said "RedHat 5.2". It's GNU sort and I believe
the -T is ok ... I had that fail the night before and had added the TMPDIR
assignment to overcome it.

+ But beyond that, it doesn't matter much how much is free when it
+ exits--it cleans up lots of temporary files before exiting.

Right, I wasn't sure just how much temporary space would be needed. Are
you saying that the TMPDIR assignment only causes the sort workspace to
move and that htmerge will always use the database directory for transient
files? My reference to the sort files was just that there could be both
the sorted and unsorted versions of the files taking up space in the
partition at some point ... but maybe not and anyway I seem to have got
past the sort stage.

+ So you need to keep an eye out as it's indexing. I wrote some simple
+ scripts for our server that flags sudden drops in disk usage or a full
+ drive. It's usually for users uploading MP3s, but it works well in
+ this situation too.

That sounds rather more sophistcated that the monitoring I've done so far

+ What was the result when you had more disk space free?

Seems fine now but it does mean I'd have to leave much more slack in the
database partition than I had anticipated. In reality the extended dig
time and the slower database extraction by htsearch may be enough to
convince "the powers that be" that indexing everything in sight/site is
rather OTT and we should only index the nominated list


To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Thu Jan 13 2000 - 02:54:37 PST