[htdig] 3.1.5: Completed large index


Subject: [htdig] 3.1.5: Completed large index
From: Peter L. Peres (plp@actcom.co.il)
Date: Mon Oct 16 2000 - 17:33:20 PDT


Hi,

the htdig finally completed the run, in slightly less than 24 hrs. I admit
that the machine is too weak for this but that is the point ;-)

In the end, it was not a bug. The database size is almost 400 MB, double
from what I had done before, but the work time was about 8 times longer
than before. I assume that having much more RAM is better.

Here are some numbers:

396336128 Oct 16 20:07 db.docdb
 26429440 Oct 16 20:08 db.docs.index
 42839040 Oct 16 20:47 db.soundex.db
789881030 Oct 16 20:14 db.wordlist
690375680 Oct 16 20:21 db.words.db

htdig: 125568 documents
htmerge: Total documents: 109763
htmerge: Total doc db size (in K): 945914
htfuzzy:Total keys: 92438

The log output from the batch (all programs were run with one -v) was
about 26MB.

The job was started about 01:00 on Oct 16.

I am very happy that this has worked out in the end ;-) Next, I'll likely
write a patch to allow quoted strings (esp. the null quoted string) in the
config file. The current dig missed all the README LICENSE etc files
because of this missing feature (imho).

thank you for your help,

Peter

PS: Has anyone seen Adobe PDF files (from Adobe, f.ex. BDF font
specification documents) which cause strange acroreader (and htdig)
problems ? I have a few and I am not glad about this.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Sun Oct 15 2000 - 17:43:20 PDT