htdig: [Patch] non english text parser broken


Vadim Chekan (vadim@gc.lviv.ua)
Wed, 4 Nov 1998 16:08:39 +0200


Hello everybody!

I found a bug in current (3.1.0.b2) release: I can't index text cyrillic
files. This is because of declare "char" instead of "unsigned char".
Function "isalpha" doesn't work with char>127.

Vadim Cheakan.

Index: Plaintext.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/Plaintext.cc,v
retrieving revision 1.4
diff -u -p -r1.4 Plaintext.cc
--- Plaintext.cc 1997/04/20 15:23:40 1.4
+++ Plaintext.cc 1998/11/04 13:56:31
@@ -53,8 +53,8 @@ Plaintext::parse(Retriever &retriever, U
     if (contents == 0 || contents->length() == 0)
  return;

- char *position = contents->get();
- char *start = position;
+ unsigned char *position = contents->get();
+ unsigned char *start = position;
     int offset = 0;
     int in_space = 0;
     String word;

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:44 PST