Wed, 20 Jan 1999 07:30:29 -0800 (PST)
>From email@example.com Wed Jan 20 07:30:26 1999
Received: from sapermedia.com (firstname.lastname@example.org [22.214.171.124])
by sdsu.edu (8.8.7/8.8.7) with ESMTP id HAA14988
for <email@example.com>; Wed, 20 Jan 1999 07:30:26 -0800 (PST)
Received: from sapermedia.com (firstname.lastname@example.org [126.96.36.199])
by sapermedia.com (8.9.1a/8.9.0) with ESMTP id JAA10334
for <email@example.com>; Wed, 20 Jan 1999 09:30:24 -0600
Date: Wed, 20 Jan 1999 09:30:24 -0600
From: Brett Baugh <firstname.lastname@example.org>
Organization: Saper Media Group
X-Mailer: Mozilla 4.5 [en] (X11; I; Linux 2.0.35 i686)
To: HT Dig <email@example.com>
Subject: Re: htdig: Foreign dictionaries and word stemming
Content-Type: text/plain; charset=us-ascii
I can't help with the German part of it, but...
Stephan Gilbert wrote:
> The other problem especially in german is assembled words. A partial
> match would be very helpful in absence of a word stemming algorithm.
> Are there any plans to add partial matches ?
There are two searching algorithms you can define in the
"search_algorithm" setting in your .conf file for that site:
'substring' and 'prefix'. Thus you might end up with something like:
search_algorithm: exact:1 substring:0.9 prefix:0.9 synonyms:0.25
...where the numbers are the weight to give each type of match.
Anyway, defining 'prefix' searching actually lets you use "*" in
keywords; it matches the _beginnings_ of words in the database. Thus
you could search for "query*" but not "*query*". 'substring' searches
take a lot longer, though, because it searches every part of each word
in the database for a match, kind of like doing a 'grep' on it. That
doesn't use wildcards though; it just always does it if it's defined
in your .conf file. Thus just putting "query" in the keywords field
of a search form actually does "*query*" instead... all the time...
and weights the results according to your 'search_algorithm' settings.
Perhaps a good enhancement for htdig would be for htsearch to use a
"substring" type search if it sees any wildcards at all in the
keywords, drop the whole "prefix" idea altogether, and not try to do a
substring search if there aren't any wildcards in the keywords. Just
my 0.0338346 DM worth...
Systems Administrator, Saper Media Group
This archive was generated by hypermail 2.0b3 on Wed Jan 20 1999 - 08:37:47 PST