[htdig] patch for Accents fuzzy algorithm


Subject: [htdig] patch for Accents fuzzy algorithm
From: Robert Marchand (robert.marchand@UMontreal.CA)
Date: Tue May 02 2000 - 06:24:08 PDT


Hi,

   I've found a small bug in the Accents fuzzy algorithm. It will occurs
if you 'fake' accents in a word that has none, like if you try to search
the word 'fùzzy' (or 'situatión' in my case).

It is the case because in order to have a smaller database, I saved only
words with accented version in the words database. I found it when I try
to search for spanish words on our database. Because there were no
'spanish version with accents' of the words I didn't find them even if
there were 'homonyms' in french.

So, here is a patch that does two things: it remove the 'key' from the
list of words in the accent database and next put it on the search words
no matter if it exists. Practicaly this mean that the 'banalized' version
is always search.

An other way to do it would be to let all the words
have their banalized version even the non-accentuated but it would mean
a bigger database. Don't really know which is best!

By the way, we have our search engin online
(http://www.umontreal.ca/moteur_rech/index.html) and we would like to
tell our appreciation of ht://Dig as a search engine. I think there were
two points in our decision (its price :-)!) and also the documentation
and especialy this list and the level of response and help one can find
here. Thanks to the makers of ht://Dig and the helpers on this list.

----

Here is the patch (for htdig version 3.1.5). You must be in the htfuzzy directory to apply it. It is to be applied after the last patch posted by Gilles Detillieux.

*** Accents.h.orig Mon May 1 13:56:03 2000 --- Accents.h Mon May 1 13:57:14 2000 *************** public: *** 22,33 **** Accents(); virtual ~Accents(); - virtual int writeDB(Configuration &config); - virtual void generateKey(char *word, String &key); virtual void addWord(char *word); ! private: }; --- 22,33 ---- Accents(); virtual ~Accents(); virtual void generateKey(char *word, String &key); virtual void addWord(char *word); ! ! virtual void getWords(char *word, List &words); ! private: }; *** Accents.cc.orig Fri Mar 3 10:34:21 2000 --- Accents.cc Mon May 1 14:01:20 2000 *************** Accents::~Accents() *** 78,86 **** --- 78,88 ---- { } + /* Obsolete */ + /* Obsolete */

//************************************************************************** *** // int Accents::writeDB(Configuration &config) // + /* int Accents::writeDB(Configuration &config) { *************** Accents::writeDB(Configuration &config) *** 125,132 **** } return OK; } - -

//************************************************************************** *** // void Accents::generateKey(char *word, String &key) // --- 127,134 ---- } return OK; } + */

//************************************************************************** *** // void Accents::generateKey(char *word, String &key) // *************** Accents::generateKey(char *word, String *** 149,155 **** } } - -

//************************************************************************** *** // void Accents::addWord(char *word) // --- 151,156 ---- *************** Accents::addWord(char *word) *** 164,169 **** --- 165,174 ---- String key; generateKey(word, key); + //Do not add fuzzyKey as a word, will be add at search time. + if (mystrcasecmp(word, key.get()) == 0) + return; + String *s = (String *) dict->Find(key); if (s) { *************** Accents::addWord(char *word) *** 176,181 **** --- 181,201 ---- } } + //************************************************************************** *** + // void Accents::getWords(char *word, List &words) + // + void + Accents::getWords(char *word, List &words) + { + if (!word || !*word) + return; + Fuzzy::getWords(word, words); + // fuzzy key itself is always search. + String fuzzyKey; + generateKey(word, fuzzyKey); + if (mystrcasecmp(fuzzyKey.get(), word) != 0) + words.Add(new String(fuzzyKey)); + }

------- Robert Marchand tél: 343-6111 poste 5210 DiTER-SDI e-mail: marchanr@diter.umontreal.ca Université de Montréal Montréal, Canada

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue May 02 2000 - 04:10:59 PDT