Subject: [htdig] db sizes
From: justin (jazoff@toddent.com)
Date: Sun Aug 06 2000 - 19:03:00 PDT

I have got htdig running perfectly now. It is updating the index without
re-reading all files:) The only problem I am having is that the db
files are very large. These are the db files for ~600M of archived html

591M db.docdb
591M db.docdb.work
11M db.docs.index
1.3G db.wordlist.work
1.6G db.words.db
4.1G total

Will changing
search_algorithm: exact:1 synonyms:0.5 endings:0.1
to just exact:1 make the db any smaller?

I am also thinking the db are large not because of htdig but because of
the email. I had used postal, a smtp benchmark to send the 600M of
mail. Postal does not send english words but random ASCII garbage,
Could this be why the db files are so large?


Attached is a sample html email generated from postal.


