[htdig] Patch: use minimum_word_length for plain text documents


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 22 Mar 1999 16:47:11 -0600 (CST)


Hi again. If any of you have been wondering why the minimum_word_length
attribute seems to have no effect on words indexed in plain text files,
here's the fix:

--- htdig/Plaintext.cc.nominword Tue Feb 16 23:03:53 1999
+++ htdig/Plaintext.cc Fri Mar 19 10:39:54 1999
@@ -67,6 +67,7 @@ Plaintext::parse(Retriever &retriever, U
 
     unsigned char *position = (unsigned char *) contents->get();
     unsigned char *start = position;
+ static int minimumWordLength = config.Value("minimum_word_length", 3);
     int offset = 0;
     int in_space = 0;
     String word;
@@ -94,11 +95,11 @@ Plaintext::parse(Retriever &retriever, U
                 head << word;
             }
 
- if (word.length() > 2)
+ if (word.length() >= minimumWordLength)
             {
                 word.lowercase();
                 word.remove(valid_punctuation);
- if (word.length() > 2)
+ if (word.length() >= minimumWordLength)
                 {
                     retriever.got_word(word,
                                        int(offset * 1000 / contents->length()),

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 25 1999 - 07:36:02 PST