[htdig] Re: accents mapping


Subject: [htdig] Re: accents mapping
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Feb 23 2000 - 16:28:16 PST


At 4:35 PM -0600 2/23/00, Gilles Detillieux wrote:
>Consider the soundex and metaphone analogy I brought up earlier.
>Any "sound" may have many possible letters or letter combinations to
>produce them. When applied to long words, you'd have even more possible
>words than for your "éphémère" example above. But soundex and metaphone
>don't generate ALL possible words. They look at all the words that have
>been indexed, and record all the canonical forms of these words only,
>so that when you look up a given word, it will also search for other
>words that it knows are in the index that have a similar sound.

Yeah, I think you're right that an on-the-fly fuzzy isn't going to be
very fast. Of course the problem with something based on the soundex
or metaphone algorithms is that you have to be sure to run htfuzzy
periodically, but the lookups would be pretty fast.

But to echo what Gilles said, you really don't want to be messing
around in WordList or parser, especially if you don't know what
you're doing. I think the Fuzzy class is pretty self-explanatory and
almost anyone could write a fuzzy class. The key for the Soundex and
Metaphone variety is the generateKey() method. The key for the
Speling and Substring variety is the getWords() method.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Feb 23 2000 - 16:33:47 PST