Re: [htdig] A Suggestion on Accents


Subject: Re: [htdig] A Suggestion on Accents
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon May 15 2000 - 06:09:56 PDT


At 12:34 PM +0100 5/15/00, D.J.Adams@soton.ac.uk wrote:
>Rather than a fuzzy accents search method, why not make the htdig database
>accent independent? After all, it is case independent already!
>For example:
>
>Garçon -> Garçon -> garçon -> garcon

I would make the analogy to word suffixes rather than to case. There
is an endings fuzzy rather than a general stemming step during
indexing. IMHO, this makes searches a bit more precise because the
alternatives will get less weight than what the user actually
entered. (Remember the old maxim "the customer is always right?")

Besides, there are some situations where the unaccented word and the
accented word do *not* mean the same thing.

(BTW, the 3.2 code isn't completely case independent. It stores a
flag when the word is capitalized. My feeling is that user queries
with capitals should return capitals preferentially.)

All that said, it would be possible to patch the code in WordList.cc
and remove accents before storing the word.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon May 15 2000 - 04:17:09 PDT