[htdig] Htdig/Htmerge -- When pre-existing databases are involved.


Subject: [htdig] Htdig/Htmerge -- When pre-existing databases are involved.
From: Sphboc@aol.com
Date: Mon Mar 20 2000 - 07:59:20 PST


Have installed Release 3.1.5*. Under an approach which searches one url at a
time, and always re-initializes that url's files prior to searching, have
been able to get all the relevant domains (some 20; call them domain01 thru
domain20) combined into one searchable data base; searches appear to be
returning valid results.

To date, during the indexing and merging of domain(i):
A. Htdig has been limited to domain(i), and has used -i.
B. An initial run of htmerge has read only the outputs of
     (A), creating a database for domain(i) only.
C. A subsequent htmerge has specified the output of (B) as
     the -m operand, and merged this into a common database.
     At start of merge, the common database does not contain
     any documents from domain(i).

This much seems to operate as intended. Each output of (B) is searchable
(using the applicable config option) as to its own domain; the final result
of (C) is searchable as to all of the domains.

What I have not been able to locate, in the on-line documentation, is any
coverage of
what happens when:
I run A and B above, without the -i option, and the pre-existing files
already have documents from domain(i).
I run C above, and both sets of files have documents from domain(i).

I'm somewhat concerned that outdated documents -- which are no longer on the
site at all -- may remain in my databases if I don't re-initialize. Appears,
however, that continued use of my initial approach will be rather
inconvenient, when the true intent is to replace a pre-existing domain.

Steven P Haver/602-242-9708

*With considerable assistance from Geoff Hutchison and Gilles Detillieux; my
thanks to both.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Mar 20 2000 - 06:57:21 PST