htdig: Problems with using htdig -a


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Thu, 17 Sep 1998 23:47:47 -0400


Hi,

I consider the following a bug, since it's not documented. Fortunately
there's an easy workaround.

I normally run the dig with the switch -a to use alternate files (allowing
others to search as I'm digging). Usually I don't use the switch -i, so it
should do an "update" dig and index only the changed or new files (which
should be a small subset of the 50,000 pages). Then the script moves the
files into place at the end of the run.

However, when using "-a" I wasn't seeing an update of the database.
Essentially htdig looks at the db.docs.work file and found it empty. So it
updates the empty db by doing a full initial dig. :-(

Here's an example solution: (yes, you might want to ignore the first cp
commands and change the first two mv commands to cp)

BASEDIR=/opt/htdig
cp $BASEDIR/db/db.wordlist $BASEDIR/db/db.wordlist.work
cp $BASEDIR/db/db.docdb $BASEDIR/db/db.docdb.work
$BASEDIR/bin/htdig -a -s
$BASEDIR/bin/htmerge -a -s
mv $BASEDIR/db/db.wordlist.work $BASEDIR/db/db.wordlist
mv $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb
mv $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index
mv $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db

This changed a 1 hr. 30 min. dig into a 15 min dig, even counting the
shuffling of files. Faster is better. :-)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:45 PST