Re: [htdig] 3.1.5 -- Wordlist files / space occupancy.


Subject: Re: [htdig] 3.1.5 -- Wordlist files / space occupancy.
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Nov 07 2000 - 21:30:17 PST


At 7:14 PM -0500 11/7/00, Sphboc@aol.com wrote:
>Do the .wordlist files, created by htdig, serve any useful purpose once they
>have been input to htmerge?

They are used in two cases:
1) They are used by htmerge to merge databases.
2) They are used (if available) by htdig for "update" runs.

>If the database created by htmerge is later merged with another database, is
>it necessary to read the .wordlist files at this time? (I suspect not, since
>the information ought to be in the .words.db).

It's actually much easier in the 3.1 code to read the .wordlist files
because of the format of the words DB.

>More-or-less-related, why is the reported database size, at the end of the
>htmerge stats, significantly higher than the sum of (space occupied by)
>.words.db, .docdb, .docs.index, and .wordlist?

This is a sum of the document sizes (including markup). The size of
your databases will vary considerably, esp. if you have a large
max_head_size and store almost all of your documents as excerpts.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Nov 07 2000 - 21:44:34 PST