Subject: Re: [htdig] htmerge/rundig functionality & disk space usage
From: Geoff Hutchison (firstname.lastname@example.org)
Date: Wed Dec 15 1999 - 15:54:46 PST
On Wed, 15 Dec 1999, Susan Alderman wrote:
> htdig -c main.conf (main webserver goes into database)
> htmerge -c main.conf (indexing of main webserver data)
> htdig -c sub1.conf (dig of webserver sub1)
> htmerge -m main.conf -c sub1.conf (merge sub1 into main database)
You have this last line backwards. You want:
htmerge -m sub1.conf -c main.conf (merge sub1 into main database)
> htmerge -c main.conf [Is sub1 still in the main database?]
> htdig -c sub2.conf
> htmerge -m main.conf -c sub2.conf [Now I have 3 servers in
> the main index, right?]
Correct, but see above as far as syntax.
> space required for the merging. Am I out to lunch here - is there
> something I'm missing?
You may want to *really* index all three webservers separately, something
htdig -c sub1.conf
htdig -c sub2.conf
htdig -c sub3.conf
htmerge -m sub1.conf -c main.conf
htmerge -m sub2.conf -c main.conf
htmerge -m sub3.conf -c main.conf
Though you'll still run into the problem of doing the sorting on the main
DB. (No way to get around that really with the current code.)
> BUT - does this mean that no one can run a search query while I'm reindexing?
No. The db.wordlist file (or the respective .work version) is only needed
if you're doing an update. It's not used for searching.
> Let me see if I've got this straight: htdig creates db.docdb, &
> db.docdb.work. htmerge creates db.docs.index, db.wordlist.work and
> db.words.db. htsearch uses db.docdb, db.docs.index, and db.words.db.
> If I want to have my indices searchable at the same time as I'm
> running htdig/htmerge, I'll need working copies of the (three,
> assuming I have that right) databases that htdig/htmerge create that
> htsearch uses. (Including this info on the page
> http://www.htdig.org/howitworks.html would be very helpful.)
Correct. (.work files are created by htdig/htmerge with the -a flag and
are never used by htsearch.)
> Also, this means no running htfuzzy, right? If I don't run htfuzzy, I
> don't get the (VERY NICE) feature of ending expansion? I've had a
> look, and the databases formed by htfuzzy (db.metaphone.db and
> db.soundex.db) are some of the smaller ones - does this really gain me
> that much?
There are a variety of fuzzy algorithms. Synonyms and endings only require
creating the databases once (and it stashes them in common/). Soundex and
metaphone require running after every reindex and it puts them in the
database directory. As to how useful they are, your mileage may vary--I
think most people use synonym and endings because it doesn't require much
disk space or time.
Williams Students Online
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Dec 15 1999 - 16:08:33 PST