Antti Rauramo (antti.rauramo@edita.fi)
Thu, 12 Aug 1999 14:50:50 +0300
peter karlsson wrote:
> > Anyone out there indexing pdf-files with ISO-Latin characters in them
> > (äöåÄÖÅ mainly)? Seems that htdig doesn't understand the meaning of the
> > special characters, and shows them w/o conversion;
>
> I don't believe that htdig *care* what character set the external parsers
> feed it with, but assumes that it is the one as specified in the current
> locale. You would probably want to hack the program you're using to parse
> PDF files to correctly convert its output to the local character set.
That's true; these weird char's show up on excerpt and can be searched for.
But I'm not using anything special to parse PDF's; just Adobe Acrobat and
then indexing with htdig. That's why I'm asking here. And the PS created by
Acrobat works is fine.
Further I don't think simply converting char's would help since the converter
would have to recognize between PC and Mac char sets, which overlap.
-- - Antti Rauramo, WWW- ja tietokanta-asiantuntija, Edita Verkkoviestintä - antti.rauramo@edita.fi, +358-9-8501 4004 (mobile)------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Thu Aug 12 1999 - 04:58:39 PDT