Re: [htdig3-dev] htdig 3.1.4 is not 8-bit-clean on solaris


Subject: Re: [htdig3-dev] htdig 3.1.4 is not 8-bit-clean on solaris
From: Marc Pohl (marc.pohl@wdr.de)
Date: Thu Jan 13 2000 - 11:41:04 PST


At 17:47 12.01.00 -0600, you wrote:
>At 6:49 PM +0100 1/12/00, Marc Pohl wrote:
>>i reviewed the sourcecode for htdig-3.2.0b1-dev-010900 this weekend
>>and discovered that there could be similar errors in
>>htword/WordType.cc because of signed char to int casts. The exactly
>>same error cannot happen because the iscntrl() is in the else branch
>>of IsStrictChar() in 3.2.
>
>Could you also post your original patch to 3.1.4 with diff -c as
>well? I'd like to have it on the htdig@htdig.org lists because I
>think it will help some of these recent questions about indexing and
>searching foreign characters.
>
>>My proposed patch is the following snippet, introducing two new
>>member functions to WordType, instead of calling isdigit() and
>>iscntrl() directly.
>
>This looks fine to me. Since it's a bug-fix, unless I hear screams of
>protest, it's going in sometime tomorrow.
>
>-Geoff
>

Hello Geoff,

Yesterday i found a small potential problem in the patched code:
At the beginning of the initialisation of WordType is the line
chrtypes[0] = 0;
Because we never call iscntrl(0) this line must be
chrtypes[0] = WORD_TYPE_CONTROL;

During my tests this make no difference, but i think that i don't have any unwanted #0 in my html-docs.

Marc

And here is my patch against the version 3.1.4:

*** WordList.cc.orig Fri Dec 10 01:28:44 1999
--- WordList.cc Thu Jan 13 20:23:29 2000
***************
*** 108,125 ****
  
      while (word && *word)
      {
! if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
          {
              alpha = 1;
              // break; /* Can't stop here, there may still be control chars! */
          }
! else if (allow_numbers && isdigit(*word))
          {
            alpha = 1;
            // break; /* Can't stop here, there may still be control chars! */
          }
  // if (*word >= 0 && *word < ' ')
! else if (iscntrl(*word))
          {
              control = 1;
              break;
--- 108,125 ----
  
      while (word && *word)
      {
! if (HtIsStrictWordChar((unsigned char)*word) && !isdigit((unsigned char)*word))
          {
              alpha = 1;
              // break; /* Can't stop here, there may still be control chars! */
          }
! else if (allow_numbers && isdigit((unsigned char)*word))
          {
            alpha = 1;
            // break; /* Can't stop here, there may still be control chars! */
          }
  // if (*word >= 0 && *word < ' ')
! else if (iscntrl((unsigned char)*word))
          {
              control = 1;
              break;

I hope that my email program will not mangle that ;-)

 -----------------------------------------------------

Marc Pohl
                                 Westdeutscher Rundfunk
Tel.: +49 221 220 8618 OSC/Videotextredaktion
FAX: +49 221 220 3882 D-50600 Koeln
Email: marc.pohl@wdr.de

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Jan 13 2000 - 11:57:37 PST