Re: [htdig] htmerge/rundig functionality & disk space usage


Subject: Re: [htdig] htmerge/rundig functionality & disk space usage
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Dec 15 1999 - 15:54:46 PST


On Wed, 15 Dec 1999, Susan Alderman wrote:

> htdig -c main.conf (main webserver goes into database)
> htmerge -c main.conf (indexing of main webserver data)
> htdig -c sub1.conf (dig of webserver sub1)
> htmerge -m main.conf -c sub1.conf (merge sub1 into main database)

You have this last line backwards. You want:

htmerge -m sub1.conf -c main.conf (merge sub1 into main database)

> htmerge -c main.conf [Is sub1 still in the main database?]

Yes.

> htdig -c sub2.conf
> htmerge -m main.conf -c sub2.conf [Now I have 3 servers in
> the main index, right?]

Correct, but see above as far as syntax.

> space required for the merging. Am I out to lunch here - is there
> something I'm missing?

You may want to *really* index all three webservers separately, something
like this:

htdig -c sub1.conf
htdig -c sub2.conf
htdig -c sub3.conf
htmerge -m sub1.conf -c main.conf
htmerge -m sub2.conf -c main.conf
htmerge -m sub3.conf -c main.conf

Though you'll still run into the problem of doing the sorting on the main
DB. (No way to get around that really with the current code.)

> BUT - does this mean that no one can run a search query while I'm reindexing?

No. The db.wordlist file (or the respective .work version) is only needed
if you're doing an update. It's not used for searching.

> Let me see if I've got this straight: htdig creates db.docdb, &
> db.docdb.work. htmerge creates db.docs.index, db.wordlist.work and
> db.words.db. htsearch uses db.docdb, db.docs.index, and db.words.db.
> If I want to have my indices searchable at the same time as I'm
> running htdig/htmerge, I'll need working copies of the (three,
> assuming I have that right) databases that htdig/htmerge create that
> htsearch uses. (Including this info on the page
> http://www.htdig.org/howitworks.html would be very helpful.)

Correct. (.work files are created by htdig/htmerge with the -a flag and
are never used by htsearch.)

> Also, this means no running htfuzzy, right? If I don't run htfuzzy, I
> don't get the (VERY NICE) feature of ending expansion? I've had a
> look, and the databases formed by htfuzzy (db.metaphone.db and
> db.soundex.db) are some of the smaller ones - does this really gain me
> that much?

There are a variety of fuzzy algorithms. Synonyms and endings only require
creating the databases once (and it stashes them in common/). Soundex and
metaphone require running after every reindex and it puts them in the
database directory. As to how useful they are, your mileage may vary--I
think most people use synonym and endings because it doesn't require much
disk space or time.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Dec 15 1999 - 16:08:33 PST