Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 19 Apr 1999 16:06:06 -0500 (CDT)
Hi, folks. I stumbled onto a bug in WordList::valid_word() on Friday,
and after looking into it, one thing led to another, so I've made some
fairly significant changes to this function.
As a result, could all of you who are testing out the 3.1.2 pre-release
that Geoff announced last week please try this patch, or the latest
htdig3-1-x CVS source tree to make sure I didn't break something else.
My concern is the switch to iscntrl(), which I think is a better test for
control characters than the previous *word < ' ' test which misses some.
However, on systems with broken locales this could potentially lead to
further problems with indexing other languages, because the whole upper
half of the character set may be treated as control. As a solution to
that, I've also added else clauses so that if HtIsStrictWordChar() accepts
the character, it won't test to see if iscntrl() would reject it. I also
realised that the earlier switch from isalpha() to HtIsStrictWordChar()
would allow digits, even if allow_numbers was false, so I added an extra
test to prevent that. I'd appreciate extra eyeballs looking this over.
Thanks!
--- htcommon/WordList.cc.old Tue Mar 23 17:17:31 1999
+++ htcommon/WordList.cc Mon Apr 19 15:47:34 1999
@@ -107,17 +107,18 @@ int WordList::valid_word(char *word)
while (word && *word)
{
- if (HtIsStrictWordChar((unsigned char)*word))
+ if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
{
alpha = 1;
- break;
+ // break; /* Can't stop here, there may still be control chars! */
}
- if (allow_numbers && isdigit(*word))
+ else if (allow_numbers && isdigit(*word))
{
alpha = 1;
- break;
+ // break; /* Can't stop here, there may still be control chars! */
}
- if (*word >= 0 && *word < ' ')
+// if (*word >= 0 && *word < ' ')
+ else if (iscntrl(*word))
{
control = 1;
break;
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Mon Apr 19 1999 - 14:14:44 PDT