Re: [htdig] Multiple database (patch)


Subject: Re: [htdig] Multiple database (patch)
From: Rajendra Inamdar (inamdar@beasys.com)
Date: Tue Feb 08 2000 - 15:26:38 PST


Hello,

Here is a patch which supports searches through multiple databases
without having to merge the databases. The patch also exposes the
search algorithm selection so that the user can select a search policy.

The attached tar fle contains:

- New files, Collection.h and Collection.cc which go to htsearch/ directory.
- Patch file multidb.patch
- Config file examples.

The patch is based on the htdig-3.2.0b1-dev-013000 snapshot. The changes
affect only htsearch program. If only one database is involved, htsearch will
behave as before. In the following discussion, collection and database is
used interchangeably.

For each database, there is a corresponding config file. In a typical config
file, there are several settings which are not database specific. It is recommended

that they be put in a shared config file, which gets included from the respective
database config files.

New Configuration Attributes:

collection_names
    type:
         string list
    used by:
         htsearch
    default:
         none
    description: A white space separated list of databases. Each named database
MUST
         have its own configuration file with the same name.
    example:
         collection_names: htdig_mail htdig_bugs

    This attribute should be put in the shared config file, although it can be
replicated in
    all config files. Config files corresponding database must exist.

search_policies
    type:
        string list
    used by:
        htsearch
    default:
        none
    description:
        This attribute defines a set of canned search policies. It is used by
htsearch to
        emit a list of search policies which the user can select during search.
Each policy
        is described by a pair of strings. The first is the external name of the
policy and the
        second is the internal name of the policy. The internal policies are
attributes defined
        in the config file, which follow the syntax and semantics of
search_algorithm attribute.
        If a search policy is selected by the user, it is used as the
search_algorithm for the
        search session.
    example:
        search_policies: \
            Substring alg_substring \
            Exact alg_exact
        alg_substring: exact:1 substring:0.5
        alg_exact: exact:1

    In the above example, alg_substring and alg_exact are arbitrary policy names.
    They must not conflict with any predefined HtDig attributes. The
search_policies
    and associated policy attributes should be put in the shared config file.

How does it work (very briefly)

I have tried to minimize structural changes to the code. Basically, the user
selects
a bunch of databases to search on, which translates to multiple values for the
config parameter passed to htsearch.

For each selected database, a Collection object is built, which captures the search

context for that database. Each database is independently searched. The collection
captures the result list for that database.

Display.cc is modified to build the aggregate result list.

Caveats

1. LOGICAL_WORDS variable reflects the last database searched. When
    displayed, it is sometimes misleading.

2. When none of the databases is selected, the default htdig.conf is used. If
    it is intended that no results be displayed, it is necessary to create the
    default database which is essentially empty.

Hope this helps.

/Raj Inamdar
inamdar@beasys.com


multidb.patch.tar

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 08 2000 - 15:23:30 PST