Re: htdig: Character entities...


Palle Girgensohn (girgen@partitur.se)
Wed, 25 Nov 1998 10:38:55 +0100


Hmm. I search for Swedish contetn. Also has "strange" characters ;-)

I set locale: sv_SE.ISO_8859-1
it htdig.conf. You might use fr_FR.ISO_8859-1 instead.

That's it. Then I can search for créer and all is well.

The locale bit must only be set if neither server nor file indicates the
character set (I think).

Hope this helps.

/Palle

Mathieu Bezkorowajny wrote:
>
> My problem:
> I must search in pages written in French, these pages
> have character entities in them:
>
> example : 'créer'
>
> when I look in the 'db.wordlist' file, it's there.
> No problem yet.
>
> When I try this word with htsearch -> major problem
> It parses the '&' and ';' as being word separator.
>
> so we search for 'cr' and 'eacute' and 'er'
> NOT GOOD
>
> If I set the 'valid_punctuation' parameter to '&' and ';'
> It drops them and give 'creacuteer'
> NOT GOOD YET
>
> How should I do it?
> Should I look into 'external_parsers' ?
> Help me !
> My brain is going to explode!
>
> mathieu@tmdesigncom.com
> ----------------------------------------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-request@sdsu.edu containing the single word "unsubscribe" in
> the body of the message.
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:52 PST