Re: [htdig3-dev] Debugged excerpt/valid_punctuation


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 28 Jan 1999 09:29:35 -0600 (CST)


According to Geoff Hutchison:
> OK, I poked through the code and worked out the problem with finding words
> with punctuation in the excerpt.
>
> Basically, htsearch takes the user input and puts it into $WORDS. It then
> does some parsing (applying fuzz and checking for boolean syntax) and puts
> the result in $LOGICAL_WORDS. When it does this, it generates a StringMatch
> with the parsed $LOGICAL_WORDS in it. This makes sure fuzzy matches are
> included in the StringMatch, but it's already stripped out
> valid_punctuation. :-(
>
> So here's my proposed fix. In addition to the logicalWords currently placed
> in searchWordsPattern in htsearch.cc, we should ALSO add the user's
> original input. This should include the punctuation and ensure that these
> words are considered when looking up the excerpt and doing hilighting.
>
> Does this make sense?

I think so. The problem is you'd have to do some reparsing of the
original input words before adding them to the StringMatch pattern,
i.e. breaking up the string into a string list, stripping out boolean
operators if necessary. I've peeked at the parsing code a bit, but
I'm afraid I don't understand its workings enough to suggest exactly
how the input string would need to be reparsed to do this correctly.
If you want to give it a shot, I'd be grateful.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST