Subject: Re: [htdig3-dev] Problem with developer sighup
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Jul 28 2000 - 10:10:11 PDT
On Sat, 29 Jul 2000, Chen-hsiu Huang wrote:
> OK. I've also checked libunicode. I'll try to start from here.
> Besides, extracting words from multi-bytes locale is pretty hard,
> especially for those terms containing both english (or some else)
> and unicode.
My first concern is that the code (including the String class) is not
multibyte-clean. It periodically makes assumptions that characters are one
byte long and uses char * arithemetic to advance one character at a time.
Given that, I think word parsing is further down the list. My guess is
that we'll want to work on the HtWordType code and make it into a
generalized word parser with appropriate subclasses. I will probably need
to start this work anyway for the new query parser.
> I guess so. But, how about rewrite htdig in PERL ? Does anyone think
> about this ?
There are *many* Perl search engines. Some good, some not-so-good. I
believe Avi keeps a list at her website: <http://www.searchtools.com/>.
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Jul 28 2000 - 00:08:42 PDT