BOUNCE htdig: Admin request
Wed, 20 Jan 1999 07:30:29 -0800 (PST)

>From Wed Jan 20 07:30:26 1999
Received: from ( [])
        by (8.8.7/8.8.7) with ESMTP id HAA14988
        for <>; Wed, 20 Jan 1999 07:30:26 -0800 (PST)
Received: from ( [])
        by (8.9.1a/8.9.0) with ESMTP id JAA10334
        for <>; Wed, 20 Jan 1999 09:30:24 -0600
Message-ID: <>
Date: Wed, 20 Jan 1999 09:30:24 -0600
From: Brett Baugh <>
Organization: Saper Media Group
X-Mailer: Mozilla 4.5 [en] (X11; I; Linux 2.0.35 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: HT Dig <>
Subject: Re: htdig: Foreign dictionaries and word stemming
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I can't help with the German part of it, but...

Stephan Gilbert wrote:

> The other problem especially in german is assembled words. A partial
> match would be very helpful in absence of a word stemming algorithm.
> Are there any plans to add partial matches ?

There are two searching algorithms you can define in the
"search_algorithm" setting in your .conf file for that site:
'substring' and 'prefix'. Thus you might end up with something like:

search_algorithm: exact:1 substring:0.9 prefix:0.9 synonyms:0.25

...where the numbers are the weight to give each type of match.
Anyway, defining 'prefix' searching actually lets you use "*" in
keywords; it matches the _beginnings_ of words in the database. Thus
you could search for "query*" but not "*query*". 'substring' searches
take a lot longer, though, because it searches every part of each word
in the database for a match, kind of like doing a 'grep' on it. That
doesn't use wildcards though; it just always does it if it's defined
in your .conf file. Thus just putting "query" in the keywords field
of a search form actually does "*query*" instead... all the time...
and weights the results according to your 'search_algorithm' settings.

Perhaps a good enhancement for htdig would be for htsearch to use a
"substring" type search if it sees any wildcards at all in the
keywords, drop the whole "prefix" idea altogether, and not try to do a
substring search if there aren't any wildcards in the keywords. Just
my 0.0338346 DM worth...

-Brett Baugh
Systems Administrator, Saper Media Group

This archive was generated by hypermail 2.0b3 on Wed Jan 20 1999 - 08:37:47 PST