Re: htdig: non-destructive updates


Colin Viebrock (cmv@privateworld.com)
Wed, 17 Jun 1998 13:19:02 -0400


Also sprach Michael Graff (at 08:18 PM 6/16/98 -0700) ...
>Is there any reasonable way to do this?
>
> (1) index a group of sites.
> (2) nightly, re-index them, updating the current database as
> needed.
> (3) keep searches up and running while the indexing is taking
> place.
> (4) not keep 2 copies of the database. It is already large.

Here is the script I use ... sent to me by Geoff Hutchison, but with some
changes of my own to make it very configurable. I don't think #2 above is
possible. But you can create a work copy of the index, and then when it's
finsihed re-indexing, replace the current one with the work copy. So you
will have 2 copies, but only during the dig/merge stage.

---- start rundig.sh file ------------

#! /bin/sh

if [ "$1" = "-v" ]; then
    verbose="-v"
fi

# This is the directory where htdig lives
BASEDIR=/export/htdig

# This is the db dir
DBDIR=$BASEDIR/db/

# This is the name of a temporary report file
REPORT=/tmp/htdig.report

# This is who gets the report
REPORT_DEST="webmaster@yourdomain.com"
export REPORT_DEST

# This is the subject line of the report
SUBJECT="cron: htdig report for domain"

# This is the name of the conf file to use
CONF=htdig.conf

# This is the directory htdig will use for temporary sort files
TMPDIR=/tmp
export TMPDIR

# This is the PATH used by this script. Change it if you have problems
# with not finding wc or grep.
PATH=/usr/local/bin:/usr/bin:/bin

##### Dig phase
STARTTIME=`date`
echo Start time: $STARTTIME
echo rundig: Start time: $STARTTIME > $REPORT
$BASEDIR/bin/htdig $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Digging: $TIME
echo rundig: Done Digging: $TIME >> $REPORT

##### Merge Phase
$BASEDIR/bin/htmerge $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Merging: $TIME
echo rundig: Done Merging: $TIME >> $REPORT

##### Cleanup Phase
# To enable htnotify or the soundex search, uncomment the following lines
# $BASEDIR/bin/htnotify $verbose >>$REPORT
# $BASEDIR/bin/htfuzzy $verbose soundex

# Remove db.wordlist
rm $DBDIR/db.wordlist
mv $DBDIR/db.docdb.work $DBDIR/db.docdb
mv $DBDIR/db.docs.index.work $DBDIR/db.docs.index
mv $DBDIR/db.words.gdbm.work $DBDIR/db.words.gdbm
END=`date`
echo End time: $END
echo rundig: End time: $END >> $REPORT
echo

# Grab the important statistics from the report file
# All lines begin with htdig: or htmerge:
fgrep "htdig:" $REPORT
echo
fgrep "htmerge:" $REPORT
echo
fgrep "rundig:" $REPORT
echo

WC=`wc -l $REPORT`
echo Total lines in $REPORT: $WC

# Send out the report ...
mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT

# ... and clean up
rm $REPORT

.........................................................................
Colin Viebrock Creative Director - Private World Communciations
cmv@privateworld.com 331 - 67 Mowat Avenue
http://www.privateworld.com Toronto, Ontario, CANADA, M6K 3E3
ICQ: 11386088

                                                           Life is cheap,
                                        but the accessories can kill you.
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:34 PST