Prefix algorithm and other tweaks


Esa Ahola (esa@cyclone.mindspring.com)
Wed, 26 Nov 1997 14:36:37 -0500 (EST)


On Mon, 17 Nov 1997, Andrew Scherpbier wrote:

> I'll write a prefix algorithm that will match the start of words and
> will be much faster. (Any volunteers to write this?)

I think I volunteered a year or so ago. :-\ I can think of two relatively
quick approaches:

1. Yank GDBM and substitute Berkeley DB in Btree mode. Random index
   and sorted index in one!

2. Binary search db.wordlist to implement prefix matching. Crude
   but surprisingly effective.

If I gather up the courage, which approach would you rather see?

By the way, I have made two small additions to htsearch and htdig to
support my own particular needs:

- Added PLUS_WORDS variable that contains the original search words
  with whitespace replaced with '+'. I use this in the output template
  to pass the wordlist to a CGI which highlights the search terms.

- Added a "local_url_preprocessor" configuration detective, used in
  conjunction with the local_urls directive to filter html files
  with some program before indexing them. I use it to remove
  false positives generating "next article" and "previous article"
  hyperlinks from the mailing list archive messages.

Do you think these would be useful for the general populace? Each
amounts to about 10 lines of code, or less.

-- 
Esa Ahola
esa@cyclone.mindspring.com



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:13 PST