AW: [htdig] HTMERGE doesn't remove URL


Subject: AW: [htdig] HTMERGE doesn't remove URL
From: Reich, Stefan (Stefan.Reich@dgn-service.de)
Date: Fri Oct 20 2000 - 08:04:07 PDT


Tnx a lot. Good to know, that I can stop searching for the mistake ;-)

Anyway, what does outdated record mean? Is there a way to get rid of let's
say records not reinserted for some days?

-----Ursprüngliche Nachricht-----
Von: Gilles Detillieux [mailto:grdetil@scrc.umanitoba.ca]
Gesendet: Freitag, 20. Oktober 2000 16:59
An: Stefan.Reich@dgn-service.de
Cc: htdig@htdig.org
Betreff: Re: [htdig] HTMERGE doesn't remove URL

According to Reich, Stefan:
> I'm just setting up a multiple database scenario for htdig 3.1.5.
>
> Each site get's its own database. In addition I want to merge all the
> databases into one collection database.
>
> So far ererything works. Now I encountered the following problem:
>
> If pages are removed from a site, the documents get removed from the
> corresponding slave database, but the htmerge leaves them in the
collection
> database.
>
> I don't want to rebuild the collection from scratch each time I merge an
> updated slave in (if I do so, everything works fine).
>
> So should the merging also remove URLs and if yes, any idea what could be
> wrong in my config?
>
> These are the steps I do:
>
> htdig -a -c slave
> htmerge -a -s -c slave
> --> stats tells me document is removed
> htmerge -a -s -m slave -c collect
> --> stats tells me document is not removed
> cd /collect
> cp db.docdb.work db.docdb
> cp db.docs.index.work db.docs.index
> cp db.words.db.work db.words.db
>
> Any hints are appreciated

htmerge -m is not designed for deleting records, other than duplicate
or outdated ones. When a document is removed from the slave database,
there is no longer any record of it, nor any record of its deletion.
So, when the slave database is merged into the collect one, all of its
records are added to collect, but if collect already has a record of the
deleted document, there is nothing in slave to tell htmerge to remove
that document from collect. Your only option is to rebuild collect from
scratch, from its constituent parts.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Oct 20 2000 - 08:10:49 PDT