Re: htdig: DB2 problem


Jeff Hill (jhill@hronline.com)
Mon, 16 Nov 1998 15:09:03 -0500


Iosif Fettich wrote:
>
> what's the total size of what you're indexing ?

292MB, I believe. I can't remember if the "du" command works exactly
write on Linux, seems like there used to be a problem -- anyway, "du
-cks" reports "292359 total", so I'll assume.

> > This seems larger than it used to be.
>
> Significantly different ? I'm not sure anymore: did you say in the last
> message that you're using 3.1.0b2 ?

I can't remember, but it seems larger by 50MB or so (could be we just
keep adding so much). I am, however, running htdig-3.1.0b2, installed
Nov. 6.

> If that gives a clue: indexing here about 5000 html documents
> (approx. 25 MB) generates something like
> -rw-r--r-- 1 root root 7284736 Nov 16 03:05 db.docdb
> -rw-r--r-- 1 root root 550912 Nov 16 03:05 db.docs.index
> -rw-r--r-- 1 root root 9905263 Nov 16 03:05 db.wordlist
> -rw-r--r-- 1 root root 9511936 Nov 16 03:05 db.words.db

So, your dbs are actually slightly larger than your document base? Well,
if htdig didn't fail, I suppose mine might be slightly larger too,
although it should still have enough space.

Am I right in assuming that running "htdig -i -v -s" isn't creating a
temporary set of databases and then writing them to the db directory?
Because if it did, I'ld need over 500MB free on the hard disk, and I
wouldn't have that much space free.

Any ideas appreciated.

> It's true, with a badwords list where I put in all meaningless words I was
> able to spot using contrib/wordfreq/. That almost halved database size.
 
I'll have to take a look at that, thanks.

Jeff H.

********* HR On-Line: The Network for Workplace Issues ********
** Ph:416-604-7251 -- Fax:416-604-4708 ** http://www.hronline.com **
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:49 PST