Subject: Re: [htdig3-dev] htsearch and multiple indexes
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Mon Sep 11 2000 - 13:04:25 PDT
According to abel deuring:
> I think there are a few problems in the current implementation of
> htsearch for using multiple indexes, mainly around the concept to choose
> multiple indexes by simply specifying them all with the "config=..." CGI
> 1) Around line 493 in Display.cc, a QuotedStringList is set up that
> obviously was intended to set the state of HTML form checkboxes to
> "checked". These checkboxes would allow the user to select the indexes
> to be searched. But the checkboxes are nowhere generated.
The code sets a series of template variables, to be used in the follow-up
search forms, e.g. in header.html. When Rajendra submitted his multi-db
support patch, he included example configuration files and a header.html
file, showing how to use this feature. His header.html file contained
the following example:
<td><input type=checkbox name=config value=htdig_mail $(COLLECTION_htdig_mail)>Mail Archives</td>
<td><input type=checkbox name=config value=htdig_bugs $(COLLECTION_htdig_bugs)>HtDig Bugs</td>
It makes the follow-up search forms a little bit more complicated, but it
avoids the need for several other configuration attributes to specify the
names, descriptions, and templates to use for the checkboxes. Of course,
with my latest enhancements to the build_select_lists attribute, you
could set it up to build the list of checkboxes for you.
> 2) _If_ such checkboxes are used (something like <input type="checkbox"
> name="config" value="index1">), the details of the HTML output produced
> by htsearch depend on the HTML settings of the config file searched
Yes, that's correct. That's an unfortunate side effect of Rajendra's
multi-db support, and it requires carefully planning out your collection
of config files to avoid discrepancies in parameters that should remain
consistent across all configs in the collection. His example config
files were fairly simple, defining only database_dir and start_url,
and then using an "include" operation to read a common.conf file that
had all the other definitions, including collection_names.
A somewhat more troublesome side effect of Rajendra's approach is that
the LOGICAL_WORDS variable takes on the value appropriate for the last
database in the collection, rather than a value that represents all
fuzzy matches found in all databases. I couldn't tell just by looking
at your code whether your patch addresses this problem.
> The attached patch to Display.cc, Disply.h and htsearch.cc should fix
> these problems:
> - The CGI parameter "config" is only used to specify a kind of a
> "central" config file. From this file, only the display specific
> parameters and "collection_names" are used. Therefore, this config file
> is read in first; in the loop in function main() that searches the
> indivdual indexes, the config file describing this index is being used;
> the "central" config file is read in again, before the HTML data is
Hmm. Re-reading the main config file again could have unpleasant
side effects. The way htsearch handles many of its input parameters
is to assign them to config attributes of the same or similar names,
overriding the attribute values it got from the config file, and then from
that point on, it uses the values from the config dictionary rather than
going back to the input parameters. It does this also when generating
the template variables for the follow-up search form. If you re-read
the main config file before generating these variables, some of your
input parameter values won't get properly passed on to the template.
I can understand your reasons for wanting to do this, as it imposes a
certain structure on how you have to set up the config files, but the
side-effects of this approach are rather unpleasant. Rajendra's approach
didn't impose any structure, so it could allow the configurer to set up
something sloppy that would yield inconsistent results, but with a bit
of care it could be set up properly without any of these nasty effects.
> - Enabling/disabling single indexes is done with a new CGI parameter,
> "collection". Only those indexes are searched which are also listed in
> the config file parameter collection_names.
Again, I can see the value of this, but it would need to be implemented
in such as way as not to clobber the internal config settings obtained
by the input parameters.
> - The HTML template files now may contain the variable
> $(COLLECTION_PARMS). This variable is expanded into a list of checkboxes
> to allow the user the selection of indiviudal indexes.
> I chose to not "assemble" the value of $(COLLECTION_PARMS) in
> Display::setVariables, but instead to modify Display::outputVariable, so
> that the HTML text for the check boxes is produced "on the fly", when
> $(COLLECTION_PARMS) is parsed. With this approach, I also removed the
> special treatment of $(HTSEARCH_RESULTS) in a wrapper template file.
This seems to introduce a lot of unnecessary complication. I'm not
sure why COLLECTION_PARMS could not be pre-assembled just as the
select lists are. I imagine it was because you wanted to be able to
use templates to do this, and templates don't allow for the assembly
of other variables. Are templates really necessary for this purpose,
though, or are they overkill? In my build_select_lists enhancements,
I've allowed for inclusion of strings to prepend and append to each
input tag, so that you could, for example, build tables of input tags,
as Rajendra did in his header.html file.
> Some things are
> - The new method Display::displayCollectionList can expand the variables
> $(COLLECTION_NAME) (the name of a config file) and $(COLLECTION_TEXT).
> The latter is intended to be are more descriptive text than the config
> file name to be displayed beside an index selection checkbox, but the
> value for this variable is at present nowhere prepared. I'm not sure,
> what is better: to use only one config file parameter, something like:
> collection_names: index1 index2 ...
> collection_descripction: index1 "some gemeral stuff" \
> index2 "some exotic data" ....
> or to use a separate parameter for each index:
> collection_names: index1 index2
> collection_description_index1: some general stuff
> collection_description_index2: some exotic data
Definitely the first case, typos notwithstanding. This is consistent
with how htsearch deals with several other list-based attributes.
Note that any config attributes you add to the package should now be
defined, and documented, in htcommon/default.cc, so you want to avoid
attribute names that vary with context, as in your second example above.
How would you document these?
> - While the user can select the indexes to be searched on a result page,
> it would be fine to have these checkboxes on search.html too. Well, I
> know, how to write an HTML page by hand :), but I wonder if its worth to
> modify htsearch, so that it can display the first search page, with
> checkboxes, if the template contaions $COLLECTION_PARMS. Or to write a
> little perl script for this job. That would make it easier for the admin
> to get a consistent "entry page" and search results.
Yes, I thought it would be quite useful myself if htsearch built an
initial search form from a template, e.g. common/search.html, in the
case where it was called with no query string. That way all the stuff
you set up for select lists, checkboxes, and so on would be configured
the same way for the initial search form as for all the follow-up search
forms in common/*.html.
> - if an index is not accessible, htsearch simply bails out. Perhaps it
> would be better to only notify the user, that [s]he did not get results
> from that index.
Yes, although that falls under the category of configuration error rather
than user error. A properly configured system shouldn't run into problems
with inaccessible databases. Mind you, with a multi-database setup, the
likelyhood of one of these databases being inaccessible does increase,
even if it is due to a configuration error, so it would be a good idea
to handle it more gracefully.
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Mon Sep 11 2000 - 13:07:24 PDT