Subject: Re: [htdig] Databases -- Read-access modules. (3.1.5)
From: Geoff Hutchison (firstname.lastname@example.org)
Date: Tue Mar 21 2000 - 11:00:11 PST
On Tue, 21 Mar 2000 Sphboc@aol.com wrote:
> Presumably, these are in some fairly-standard database format; if I could
> determine what this is, and obtain field lists, it would be a major step
You'll be *much* happier parsing db.wordlist for the word database, which
is an ASCII file. You'll also be much happier using the -t flag for htdig
and parsing the resulting db.docs text file.
Both files have records separated by \n characters and fields separated by
tabs with field labels before each field (label:field)
The wordlist format is:
word <tab> i:DocID <tab> l:location <tab> w:weight <tab> c:count <tab> a:anchor
Note that count and anchor are optional and are dropped if they're the
The fields in the db.docs are a bit more complex, but if you're willing to
read the source, they're in DocumentDB.cc under "CreateSearchDB" with the
key fields being the DocID and the URL (the first two).
Williams Students Online
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Tue Mar 21 2000 - 09:57:52 PST