Re: [htdig] Databases -- Read-access modules. (3.1.5)

Subject: Re: [htdig] Databases -- Read-access modules. (3.1.5)
From: Geoff Hutchison (
Date: Tue Mar 21 2000 - 11:00:11 PST

On Tue, 21 Mar 2000 wrote:

> db.words.db
> db.docdb
> Presumably, these are in some fairly-standard database format; if I could
> determine what this is, and obtain field lists, it would be a major step
> forward.

You'll be *much* happier parsing db.wordlist for the word database, which
is an ASCII file. You'll also be much happier using the -t flag for htdig
and parsing the resulting text file.

Both files have records separated by \n characters and fields separated by
tabs with field labels before each field (label:field)

The wordlist format is:
word <tab> i:DocID <tab> l:location <tab> w:weight <tab> c:count <tab> a:anchor

Note that count and anchor are optional and are dropped if they're the

The fields in the are a bit more complex, but if you're willing to
read the source, they're in under "CreateSearchDB" with the
key fields being the DocID and the URL (the first two).

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Tue Mar 21 2000 - 09:57:52 PST