Re: [htdig] 3.1.5 -- Wordlist files / space occupancy.

Subject: Re: [htdig] 3.1.5 -- Wordlist files / space occupancy.
From: Geoff Hutchison (
Date: Tue Nov 07 2000 - 21:30:17 PST

At 7:14 PM -0500 11/7/00, wrote:
>Do the .wordlist files, created by htdig, serve any useful purpose once they
>have been input to htmerge?

They are used in two cases:
1) They are used by htmerge to merge databases.
2) They are used (if available) by htdig for "update" runs.

>If the database created by htmerge is later merged with another database, is
>it necessary to read the .wordlist files at this time? (I suspect not, since
>the information ought to be in the .words.db).

It's actually much easier in the 3.1 code to read the .wordlist files
because of the format of the words DB.

>More-or-less-related, why is the reported database size, at the end of the
>htmerge stats, significantly higher than the sum of (space occupied by)
>.words.db, .docdb, .docs.index, and .wordlist?

This is a sum of the document sizes (including markup). The size of
your databases will vary considerably, esp. if you have a large
max_head_size and store almost all of your documents as excerpts.

-Geoff Hutchison
Williams Students Online

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Tue Nov 07 2000 - 21:44:34 PST