[htdig] htmerge transient files?


Subject: [htdig] htmerge transient files?
From: Malcolm Austen (malcolm.austen@computing-services.oxford.ac.uk)
Date: Mon Jan 10 2000 - 01:43:08 PST


I've been experimenting with our index ... instead of restricting it to
the 138 official servers I'm using those as the seed (start_urls) but
allowing the run to index anything within .ox.ac.uk (with a few
restrictions to avoid indexing mirror servers!). All my files are rather
larger since whereas is used to index about 50k files, now it has 148k
files to cope with.

This is with 3.1.3 BTW, I will upgrade to 3.1.4 RSN 8-)

I've got htdig to run ok but htmerge has failed on me with:

        DB2 problem...: PANIC: No space left on device
        DB2 problem...: /db/wwwsearch/new/db.words.db: write failed for
                                                                page 584291

This sounds fair enough but:

 - on exit from htmerge there appears to be about 1.5Gb free on the device
 - certainly a moment later htfuzzy found space to write >50Mb to it
 - I'm not using the same device for temp files - I already had htmerge
        fail because sort could not manage with the 700Mb available in
        /tmp and so have set:
        
                setenv TMPDIR /var/spider
                # where there is 5Gb available

Has anyone any thoughts as to what might have transiently soaked up the
space? Could sort have left some space allocated?

Tonight's run should have at least an extra 1.5Gb free on the device at
that point ... we shall see what happens!

regards,
        Malcolm.
+
| Malcolm Austen, Tel: +44(0) 1865 273216
| Oxford University Computing Services, Fax: +44(0) 1865 273275
| 13 Banbury Road, Email - malcolm.austen@oucs.ox.ac.uk
| Oxford, OX2 6NN, England WWW - http://users.ox.ac.uk/~malcolm/
+

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Jan 10 2000 - 01:59:00 PST