htdig: Character entities...


Mathieu Bezkorowajny (mathieu@tmdesigncom.com)
Wed, 25 Nov 1998 02:17:40 +0000


My problem:
        I must search in pages written in French, these pages
        have character entities in them:
        
        example : 'créer'

        when I look in the 'db.wordlist' file, it's there.
        No problem yet.

        When I try this word with htsearch -> major problem
         It parses the '&' and ';' as being word separator.
        
        so we search for 'cr' and 'eacute' and 'er'
        NOT GOOD

        If I set the 'valid_punctuation' parameter to '&' and ';'
        It drops them and give 'creacuteer'
        NOT GOOD YET

How should I do it?
Should I look into 'external_parsers' ?
Help me !
My brain is going to explode!

mathieu@tmdesigncom.com
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:52 PST