Esa Ahola (firstname.lastname@example.org)
Wed, 26 Nov 1997 14:36:37 -0500 (EST)
On Mon, 17 Nov 1997, Andrew Scherpbier wrote:
> I'll write a prefix algorithm that will match the start of words and
> will be much faster. (Any volunteers to write this?)
I think I volunteered a year or so ago. :-\ I can think of two relatively
1. Yank GDBM and substitute Berkeley DB in Btree mode. Random index
and sorted index in one!
2. Binary search db.wordlist to implement prefix matching. Crude
but surprisingly effective.
If I gather up the courage, which approach would you rather see?
By the way, I have made two small additions to htsearch and htdig to
support my own particular needs:
- Added PLUS_WORDS variable that contains the original search words
with whitespace replaced with '+'. I use this in the output template
to pass the wordlist to a CGI which highlights the search terms.
- Added a "local_url_preprocessor" configuration detective, used in
conjunction with the local_urls directive to filter html files
with some program before indexing them. I use it to remove
false positives generating "next article" and "previous article"
hyperlinks from the mailing list archive messages.
Do you think these would be useful for the general populace? Each
amounts to about 10 lines of code, or less.
-- Esa Ahola email@example.com
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:13 PST