Re: Prefix algorithm and other tweaks


Esa Ahola (esa@cyclone.mindspring.com)
Thu, 11 Dec 1997 18:14:29 -0500 (EST)


Haven't heard back from you; that's quite okay, just wanted to make sure
mail was not getting lost in one direction or another.

I discovered that the prefix algorithm is pretty overbearing in complex
queries without a mechanism to request it explicitly for specific words.
I did a quick hack to use a trailing '*' to indicate prefix matching; e.g.

    foo or bar*

My test page mentioned below now uses that syntax.

Do you think this is a worthwhile enhancement to ht://Dig?

-- 
Esa Ahola
esa@cyclone.mindspring.com

---------- Forwarded message ---------- Date: Fri, 28 Nov 1997 01:50:33 -0500 (EST) From: Esa Ahola <esa@cyclone.mindspring.com> To: Andrew Scherpbier <andrew@contigo.com> Subject: Re: Prefix algorithm and other tweaks

> 1. Yank GDBM and substitute Berkeley DB in Btree mode. Random index > and sorted index in one!

This was easier than I thought, and I don't even speak C++. Kudos aplenty to your exceptionally clear code!

I have implemented a prototype "prefix" fuzzy algorithm. Works wonders so far in limited testing; see

http://mercedes.mindspring.com/mercedes/archives/prefix.html

Seems that additional configuration variables are in order, such as max prefix matches and minimum prefix length (one or two-character prefixes will be rather hopeless with large databases.)

This is exciting; I think prefix matching is by far the most useful fuzzy algorithm.

-- 
Esa Ahola
esa@cyclone.mindspring.com



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:24 PST