[htdig] Looking for an elegant solution

Ivan Trundle (ivan.trundle@alia.org.au)
Sat, 22 May 1999 12:50:20 +1000


I'm looking for an elegant solution for our smallish web site, and one
which I have not found through searching the FAQs or past messages. Here
it is in a nutshell:

I am currently ht://digging very nicely, thanks, an Apache-served
solaris box, with a total of around 5000 defined html documents (all of
which are your garden-variety web pages - nothing unusual in them at
all, and rarely are the pages more than a few screenfulls). I'm
digging/merging the entire site into one database, and my search page
currently returns results on a nicely-defined template.

I offer a site search page that spans the site (which is what I want for
the site's "main" search tool), and returns results that match the
general look and feel of the bulk of the pages across the site.

However, an increasing number of custom subdirectories are being
developed with a different look and feel (conference pages, individual
user pages, etc). The owners of these subdirectories want a search tool
customised for their own requirements - and they only require a subset
of the existing htdig database (this is important - the main site search
tool must return results from the entire site, not just the 'non-custom'
pages, whilst the custom directory's search tool only needs to return a
subset restricted to that directory).

Looking at roughly 10 to 20 subdirectories (though it may grow to double
this, in time), and each with no more than about 20 to 150 pages. Data
in these pages is nothing unusual and not too large. Disk space for the
entire site is not an issue (not yet, anyway!).

In essence: the custom subdirectory owners all want different layouts to
their own search pages, want searches confined to their own individual
directories, and want results given on customised layouts. The main site
search tool needs to encompass the lot.

So here is the question: Is it better (from both a performance and
management perspective - and recognising the slight overlap generated)
to create different databases for each directory, and different config
files and templates OR, is it better to use a single database and use
different config files for each, with varying exclude_urls values? Or is
there another way that I have missed altogether?

In the future I may have to consider different search algorithms (some
may want fuzzier results, for example), but this is not so important
given the significantly smaller size of their data. It is unlikely that
these subdirectories would require a different config in terms of the
data dug (wordlists, punctuation, etc).

Hope I'm making sense here, and that someone can shed some light on my problem.

TIA, Ivan

Ivan Trundle   Manager, Systems and Publishing
Australian Library and Information Association
ivan.trundle@alia.org.au       www.alia.org.au
telephone +61 2 6285 1877  fax +61 2 6282 2249
PO Box E441   Kingston   ACT 2604    AUSTRALIA
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri May 21 1999 - 19:02:16 PDT