Re: [htdig] HTMERGE doesn't remove URL

Subject: Re: [htdig] HTMERGE doesn't remove URL
From: Gilles Detillieux (
Date: Fri Oct 20 2000 - 07:59:08 PDT

According to Reich, Stefan:
> I'm just setting up a multiple database scenario for htdig 3.1.5.
> Each site get's its own database. In addition I want to merge all the
> databases into one collection database.
> So far ererything works. Now I encountered the following problem:
> If pages are removed from a site, the documents get removed from the
> corresponding slave database, but the htmerge leaves them in the collection
> database.
> I don't want to rebuild the collection from scratch each time I merge an
> updated slave in (if I do so, everything works fine).
> So should the merging also remove URLs and if yes, any idea what could be
> wrong in my config?
> These are the steps I do:
> htdig -a -c slave
> htmerge -a -s -c slave
> --> stats tells me document is removed
> htmerge -a -s -m slave -c collect
> --> stats tells me document is not removed
> cd /collect
> cp db.docdb
> cp
> cp db.words.db
> Any hints are appreciated

htmerge -m is not designed for deleting records, other than duplicate
or outdated ones. When a document is removed from the slave database,
there is no longer any record of it, nor any record of its deletion.
So, when the slave database is merged into the collect one, all of its
records are added to collect, but if collect already has a record of the
deleted document, there is nothing in slave to tell htmerge to remove
that document from collect. Your only option is to rebuild collect from
scratch, from its constituent parts.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Fri Oct 20 2000 - 08:04:27 PDT