Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 28 Jan 1999 09:29:35 -0600 (CST)
According to Geoff Hutchison:
> OK, I poked through the code and worked out the problem with finding words
> with punctuation in the excerpt.
>
> Basically, htsearch takes the user input and puts it into $WORDS. It then
> does some parsing (applying fuzz and checking for boolean syntax) and puts
> the result in $LOGICAL_WORDS. When it does this, it generates a StringMatch
> with the parsed $LOGICAL_WORDS in it. This makes sure fuzzy matches are
> included in the StringMatch, but it's already stripped out
> valid_punctuation. :-(
>
> So here's my proposed fix. In addition to the logicalWords currently placed
> in searchWordsPattern in htsearch.cc, we should ALSO add the user's
> original input. This should include the punctuation and ensure that these
> words are considered when looking up the excerpt and doing hilighting.
>
> Does this make sense?
I think so. The problem is you'd have to do some reparsing of the
original input words before adding them to the StringMatch pattern,
i.e. breaking up the string into a string list, stripping out boolean
operators if necessary. I've peeked at the parsing code a bit, but
I'm afraid I don't understand its workings enough to suggest exactly
how the input string would need to be reparsed to do this correctly.
If you want to give it a shot, I'd be grateful.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST