[htdig3-dev] File usage (htdig, htmerge, htsearch)


Subject: [htdig3-dev] File usage (htdig, htmerge, htsearch)
From: Sphboc@aol.com
Date: Thu Mar 02 2000 - 15:41:04 PST


I've just installed htdig; in our situation, we'll need to index multiple
domains, in such a manner that htsearch is able to access a "combined"
version. (so a keyword search will locate results from any of the domains).

I was hoping that I could use only one database, "htdig" one (or relatively
few) URL's at a given time, and thus "stagger" the process of re-indexing the
database.

At least as I've been running, however, htdig appears to be "re-checking"
every url which is already in the database, presumably with intent to
determine whether any have changed. I can see rationale to this, but it will
result in a substantial (and very-possibly unacceptable) workload increase.

Is there any way to prevent this re-checking behavior?

Whether or not there is, I have been unable to locate any clear documentation
concerning file handling. specifically:
A. Which data-input files are mandatory, and which optional, for each of the
three
components?
B. Which data files do htdig, and htmerge, create and/or update?

What I think I want to develop is an approach under which htdig is executed
against partial databases (each containing results from relatively few
domains), and htmerge is used to merge the search results, from the domains
in each of the partial databases, into a combined database.

If there's an FAQ, or equivalent, which covers this, please so advise . .

Steven P Haver

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Mar 02 2000 - 15:45:51 PST