[htdig] db sizes


Subject: [htdig] db sizes
From: justin (jazoff@toddent.com)
Date: Sun Aug 06 2000 - 19:03:00 PDT


I have got htdig running perfectly now. It is updating the index without
re-reading all files:) The only problem I am having is that the db
files are very large. These are the db files for ~600M of archived html
mail:

591M db.docdb
591M db.docdb.work
11M db.docs.index
1.3G db.wordlist.work
1.6G db.words.db
4.1G total

Will changing
search_algorithm: exact:1 synonyms:0.5 endings:0.1
to just exact:1 make the db any smaller?

I am also thinking the db are large not because of htdig but because of
the email. I had used postal, a smtp benchmark to send the 600M of
mail. Postal does not send english words but random ASCII garbage,
Could this be why the db files are so large?

Justin

Attached is a sample html email generated from postal.


msg03003.html

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sun Aug 06 2000 - 21:02:51 PDT