BOUNCE htdig: Admin request


owner-htdig@sdsu.edu
Wed, 20 Jan 1999 07:30:29 -0800 (PST)


>From andrew@contigo.com Wed Jan 20 07:30:26 1999
Received: from sapermedia.com (root@sapermedia.com [208.12.162.66])
        by sdsu.edu (8.8.7/8.8.7) with ESMTP id HAA14988
        for <htdig@sdsu.edu>; Wed, 20 Jan 1999 07:30:26 -0800 (PST)
Received: from sapermedia.com (root@brett.sapermedia.com [208.24.205.132])
        by sapermedia.com (8.9.1a/8.9.0) with ESMTP id JAA10334
        for <htdig@sdsu.edu>; Wed, 20 Jan 1999 09:30:24 -0600
Sender: root@sapermedia.com
Message-ID: <36A5F690.73274CFB@sapermedia.com>
Date: Wed, 20 Jan 1999 09:30:24 -0600
From: Brett Baugh <brett@sapermedia.com>
Organization: Saper Media Group
X-Mailer: Mozilla 4.5 [en] (X11; I; Linux 2.0.35 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: HT Dig <htdig@sdsu.edu>
Subject: Re: htdig: Foreign dictionaries and word stemming
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I can't help with the German part of it, but...

Stephan Gilbert wrote:

> The other problem especially in german is assembled words. A partial
> match would be very helpful in absence of a word stemming algorithm.
> Are there any plans to add partial matches ?

There are two searching algorithms you can define in the
"search_algorithm" setting in your .conf file for that site:
'substring' and 'prefix'. Thus you might end up with something like:

search_algorithm: exact:1 substring:0.9 prefix:0.9 synonyms:0.25

...where the numbers are the weight to give each type of match.
Anyway, defining 'prefix' searching actually lets you use "*" in
keywords; it matches the _beginnings_ of words in the database. Thus
you could search for "query*" but not "*query*". 'substring' searches
take a lot longer, though, because it searches every part of each word
in the database for a match, kind of like doing a 'grep' on it. That
doesn't use wildcards though; it just always does it if it's defined
in your .conf file. Thus just putting "query" in the keywords field
of a search form actually does "*query*" instead... all the time...
and weights the results according to your 'search_algorithm' settings.

Perhaps a good enhancement for htdig would be for htsearch to use a
"substring" type search if it sees any wildcards at all in the
keywords, drop the whole "prefix" idea altogether, and not try to do a
substring search if there aren't any wildcards in the keywords. Just
my 0.0338346 DM worth...

-Brett Baugh
Systems Administrator, Saper Media Group



This archive was generated by hypermail 2.0b3 on Wed Jan 20 1999 - 08:37:47 PST