Re: [htdig] wildcard matching, 8-bit characters, and 2-letter words


Subject: Re: [htdig] wildcard matching, 8-bit characters, and 2-letter words
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Mar 31 2000 - 12:28:57 PST


On Fri, 31 Mar 2000, atta dubson wrote:

> 1. i have a website with the tipitaka (in the pali language) and would
> like to be able to search for word endings, but the wildcard * only seems
> to work at the end of a word or partial word and not at the beginning. i
> want to search for "*buddhassa" and get matches for "sammaasambudhassa."

By wildcard searching, you must mean prefix searching--it's the only fuzzy
method that recognizes the "*" character (twhich is the default
prefix_match_character). You can switch to substring matching, but then
you won't want to use "*" in the query. Full regex searching is available
in 3.2.

> 2. when i search for words with an 8-bit character, i never get any
> matches. do i need to change something in the configuration?

Yes. You need to set the locale in your config file. See the FAQ or
<http://www.htdig.org/attrs.html#locale>

> 3. i never get any matches on 2-letter words. can this be fixed?

This was mentioned previously. Set the minimum_word_length attribute.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Mar 31 2000 - 11:27:35 PST