Re: [htdig] Does htmerge remove URL from database ?


Subject: Re: [htdig] Does htmerge remove URL from database ?
From: David Adams (D.J.Adams@soton.ac.uk)
Date: Mon Nov 27 2000 - 01:30:11 PST


I found that the extra runs of htmerge were necessary when I was merging two
runs of htdig. Unless I ran both databases through htmerge before merging
them I was getting

Deleted, invalid:

against some pages in the htmerge run. Compared to the time required to run
htdig, the extra htmerge runs are trivial, so you have little to loose by
including them.

Use the -v option with both htdig and htmerge and see if you get any message
re the pages that don't appear in the final index.

----- Original Message -----
From: "Geoff Hutchison" <ghutchis@wso.williams.edu>
To: "Olivier Korn" <olivier.korn@enseignant.org>
Cc: "Gilles Detillieux" <grdetil@scrc.umanitoba.ca>; <htdig@htdig.org>
Sent: Sunday, November 26, 2000 4:07 AM
Subject: Re: [htdig] Does htmerge remove URL from database ?

> At 2:21 PM +0100 11/23/00, Olivier Korn wrote:
> >I tried it and it didn't solve the problem. BTW, I don't think that
> >these extra merges are necessary either.
>
> No, they should not be at all necessary unless there's truly
> something horrific wrong with the merging code--it only uses the
> files directly output from htdig. (My idea was that it would be
> faster if you didn't need to run htmerge on intermediate DB.)
>
> >Now, I run :
> >htmerge -c site#.conf
> >then
> >htmerge -c site1.conf -m site#.conf (with # > 1)
> >
> >If I then run
> >htsearch -c site5.conf with words="rénovation tourisme", it finds
> >the document (in first place.)
> >But if I do
> >htsearch -c site1.conf with the same words, it returns the "nomatch"
document.
> >
> >Some of the web hosts are case sensitives and some are not. Could it
> >be the source of my problem ?
>
> I wouldn't think so. But you have to be pretty careful that the URL
> encodings are shared between your site.conf files. Personally, I make
> up a "main.conf," include that in the other files and only set the
> start_url and a minimal number of things in the individual site.conf
> files. In particular, it makes it easy to change something in all
> config files at once.
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-unsubscribe@htdig.org
> You will receive a message to confirm this.
> List archives: <http://www.htdig.org/mail/menu.html>
> FAQ: <http://www.htdig.org/FAQ.html>
>
>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Mon Nov 27 2000 - 01:38:52 PST