Re: [htdig] A Suggestion on Accents


Subject: Re: [htdig] A Suggestion on Accents
From: D.J.Adams@soton.ac.uk
Date: Tue May 16 2000 - 01:09:14 PDT


> >Rather than a fuzzy accents search method, why not make the htdig database
> >accent independent? After all, it is case independent already!
> >For example:
> >
> >Garçon -> Garçon -> garçon -> garcon
>
> I would make the analogy to word suffixes rather than to case. There
> is an endings fuzzy rather than a general stemming step during
> indexing. IMHO, this makes searches a bit more precise because the
> alternatives will get less weight than what the user actually
> entered. (Remember the old maxim "the customer is always right?")
>
> Besides, there are some situations where the unaccented word and the
> accented word do *not* mean the same thing.

Yes, and when I search for 'garçon' am I looking for a waiter or a school boy?

>
> (BTW, the 3.2 code isn't completely case independent. It stores a
> flag when the word is capitalized. My feeling is that user queries
> with capitals should return capitals preferentially.)
>

Neat idea.

> All that said, it would be possible to patch the code in WordList.cc
> and remove accents before storing the word.
>

I'll take a look at the 3.1.5 code, but don't hold your breath.

> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/

-- 
 
David J Adams
<D.J.Adams@soton.ac.uk>
Computing Services
University of Southampton

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon May 15 2000 - 22:57:16 PDT