Re: [htdig] Precise Fuzziness


Subject: Re: [htdig] Precise Fuzziness
From: Dave Melton (dmelton@blarg.net)
Date: Sat Nov 27 1999 - 13:36:38 PST


Geoff,

As I mentioned, there are two main systems for transcribing
Japanese words into English characters. One system is much more
common than the other, and produces spellings that make (I think)
a little more sense to the native English speaker. I've just
received 600+ HTML files that use the other system. I don't want
to take the time to change all of these files, and the person who
sent them to me doesn't want them changed. On the other hand,
most of the site's users are far more used to the more common
spellings of person and place names.

It would be ideal if I could provide a short list of acceptable
alternate spellings...my "precise fuzziness". The number of
required substitutions is actually pretty small...the following
should cover it:

For "o", accept "o", "oo", or "o'o"
For "u", accept "u", "uu", or "u'u"
For "n", accept "n" or "n'"
For "zu", accept "zu", "tsu", or "dzu"

All of this could, of course, be accomplished by substituting
some regular expression logic into the search string.

One common example is the spelling of Japan's largest city. A
user would want to search for "Tokyo", but would need it to
match "Tokyoo" in the alternate spelling HTML files.

I'd love to find a simple way to do this. I haven't looked
into the sources at all...I'd rather not go that way if I don't
have to. On the other hand, if it's possible to build a "custom
fuzzy", that might be an option.

Thanks in advance for any ideas or recommendations,

  Dave Melton

-----Original Message-----
From: Geoff Hutchison <ghutchis@wso.williams.edu>
To: Dave Melton <dmelton@blarg.net>
Cc: htdig@htdig.org <htdig@htdig.org>
Date: Saturday, November 27, 1999 12:51 PM
Subject: Re: [htdig] Precise Fuzziness

>At 3:46 AM -0800 11/22/99, Dave Melton wrote:
>>vowels to indicate certain sounds. I've been experimenting with
>>soundex and metaphone, but if I turn the weighting up enough that it
>>has any effect, I get far too many bogus matches to be useful.
>
>I would guess you might need a custom fuzzy to do what you want.
>
>>Is there any way to manually define a specific set of matching
rules?
>>If search strings could contain regular expressions, I could do
what
>>I want by modifying the search string before htsearch sees it. Are
>>there any other ways to accomplish this kind of thing?
>
>No, there isn't a way to manually define a set of matching rules, but
>that's a good idea.
>
>It would help if you could elaborate on your use of regex. The
>current development code does support regex, but I'm not sure what
>your plan is.
>
>-Geoff Hutchison
>Williams Students Online
>http://wso.williams.edu/
>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-unsubscribe@htdig.org
>You will receive a message to confirm this.
>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Sat Nov 27 1999 - 13:48:51 PST