Subject: Re: [htdig] A Suggestion on Accents
Date: Tue May 16 2000 - 01:09:14 PDT
> >Rather than a fuzzy accents search method, why not make the htdig database
> >accent independent? After all, it is case independent already!
> >For example:
> >Garçon -> Garçon -> garçon -> garcon
> I would make the analogy to word suffixes rather than to case. There
> is an endings fuzzy rather than a general stemming step during
> indexing. IMHO, this makes searches a bit more precise because the
> alternatives will get less weight than what the user actually
> entered. (Remember the old maxim "the customer is always right?")
> Besides, there are some situations where the unaccented word and the
> accented word do *not* mean the same thing.
Yes, and when I search for 'garçon' am I looking for a waiter or a school boy?
> (BTW, the 3.2 code isn't completely case independent. It stores a
> flag when the word is capitalized. My feeling is that user queries
> with capitals should return capitals preferentially.)
> All that said, it would be possible to patch the code in WordList.cc
> and remove accents before storing the word.
I'll take a look at the 3.1.5 code, but don't hold your breath.
> -Geoff Hutchison
> Williams Students Online
-- David J Adams <D.J.Adams@soton.ac.uk> Computing Services University of Southampton
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Mon May 15 2000 - 22:57:16 PDT