Re: [htdig] Does htmerge remove URL from database ?


Subject: Re: [htdig] Does htmerge remove URL from database ?
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Nov 22 2000 - 10:35:28 PST


According to Olivier Korn:
> 3. Once a week, htdig is called on each site with "htdig -i -c site1.conf"
> then "htdig -i -c site2.conf", (and so on.)
>
> 4. After all the sites have been htdigged, I run htmerge in sequence in
> order to merge all the small databases into one.
> First call is "htmerge -c site1.conf", subsequents call are "htmerge -c
> site1.conf -m site2.conf", "htmerge -c site1.conf -m site3.conf", (and so on.)
...
> 2. Now let's hear the amazing part of my story. If I do a "htmerge -c
> site5.conf" (notice there is no -m this time.) and if I htsearch -c
> site5.conf with "rénovation tourisme" my document is said to be found !
> Said in another way, the document was indexed but was certainly ripped out
> when merging with another database.

I think after each separate htdig -i -c site#.conf you should run a
separate htmerge -c site#.conf, not just on the first site, before you
merge everything together. Try that and see if it solves the problem.
I think the intention was that these extra merges should not have been
necessary, but this has come up before, and I think there's a problem
with merging multiple DBs when they haven't already been cleaned up by
a simple htmerge.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Nov 22 2000 - 10:44:10 PST