Subject: Re: [htdig] Multiple database (patch)
From: Rajendra Inamdar (email@example.com)
Date: Tue Feb 08 2000 - 15:26:38 PST
Here is a patch which supports searches through multiple databases
without having to merge the databases. The patch also exposes the
search algorithm selection so that the user can select a search policy.
The attached tar fle contains:
- New files, Collection.h and Collection.cc which go to htsearch/ directory.
- Patch file multidb.patch
- Config file examples.
The patch is based on the htdig-3.2.0b1-dev-013000 snapshot. The changes
affect only htsearch program. If only one database is involved, htsearch will
behave as before. In the following discussion, collection and database is
For each database, there is a corresponding config file. In a typical config
file, there are several settings which are not database specific. It is recommended
that they be put in a shared config file, which gets included from the respective
database config files.
New Configuration Attributes:
description: A white space separated list of databases. Each named database
have its own configuration file with the same name.
collection_names: htdig_mail htdig_bugs
This attribute should be put in the shared config file, although it can be
all config files. Config files corresponding database must exist.
This attribute defines a set of canned search policies. It is used by
emit a list of search policies which the user can select during search.
is described by a pair of strings. The first is the external name of the
policy and the
second is the internal name of the policy. The internal policies are
in the config file, which follow the syntax and semantics of
If a search policy is selected by the user, it is used as the
search_algorithm for the
Substring alg_substring \
alg_substring: exact:1 substring:0.5
In the above example, alg_substring and alg_exact are arbitrary policy names.
They must not conflict with any predefined HtDig attributes. The
and associated policy attributes should be put in the shared config file.
How does it work (very briefly)
I have tried to minimize structural changes to the code. Basically, the user
a bunch of databases to search on, which translates to multiple values for the
config parameter passed to htsearch.
For each selected database, a Collection object is built, which captures the search
context for that database. Each database is independently searched. The collection
captures the result list for that database.
Display.cc is modified to build the aggregate result list.
1. LOGICAL_WORDS variable reflects the last database searched. When
displayed, it is sometimes misleading.
2. When none of the databases is selected, the default htdig.conf is used. If
it is intended that no results be displayed, it is necessary to create the
default database which is essentially empty.
Hope this helps.
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Tue Feb 08 2000 - 15:23:30 PST