Re: [htdig] Does htmerge remove URL from database ?


Subject: Re: [htdig] Does htmerge remove URL from database ?
From: Olivier Korn (olivier.korn@enseignant.org)
Date: Thu Nov 23 2000 - 05:21:15 PST


At 12:35 22/11/2000 -0600, Gilles Detillieux wrote:
> > 4. After all the sites have been htdigged, I run htmerge in sequence in
> > order to merge all the small databases into one.
> > First call is "htmerge -c site1.conf", subsequents call are "htmerge -c
> > site1.conf -m site2.conf", "htmerge -c site1.conf -m site3.conf", (and
> so on.)
>...
> > 2. Now let's hear the amazing part of my story. If I do a "htmerge -c
> > site5.conf" (notice there is no -m this time.) and if I htsearch -c
> > site5.conf with "rénovation tourisme" my document is said to be found !
> > Said in another way, the document was indexed but was certainly ripped out
> > when merging with another database.
>
>I think after each separate htdig -i -c site#.conf you should run a
>separate htmerge -c site#.conf, not just on the first site, before you
>merge everything together. Try that and see if it solves the problem.
>I think the intention was that these extra merges should not have been
>necessary, but this has come up before, and I think there's a problem
>with merging multiple DBs when they haven't already been cleaned up by
>a simple htmerge.

I tried it and it didn't solve the problem. BTW, I don't think that these
extra merges are necessary either.

Now, I run :
htmerge -c site#.conf
then
htmerge -c site1.conf -m site#.conf (with # > 1)

If I then run
htsearch -c site5.conf with words="rénovation tourisme", it finds the
document (in first place.)
But if I do
htsearch -c site1.conf with the same words, it returns the "nomatch" document.

Some of the web hosts are case sensitives and some are not. Could it be the
source of my problem ?

What are the rules for htmerge ? When does it really remove URLs from
database ?

--
Olivier Korn
Strasbourg, France.

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Nov 23 2000 - 05:33:52 PST