Re: [htdig] PDF & ISO-Latin chars

Antti Rauramo (
Thu, 12 Aug 1999 14:50:50 +0300

peter karlsson wrote:

> > Anyone out there indexing pdf-files with ISO-Latin characters in them
> > (ń÷ň─Í┼ mainly)? Seems that htdig doesn't understand the meaning of the
> > special characters, and shows them w/o conversion;
> I don't believe that htdig *care* what character set the external parsers
> feed it with, but assumes that it is the one as specified in the current
> locale. You would probably want to hack the program you're using to parse
> PDF files to correctly convert its output to the local character set.

That's true; these weird char's show up on excerpt and can be searched for.

But I'm not using anything special to parse PDF's; just Adobe Acrobat and
then indexing with htdig. That's why I'm asking here. And the PS created by
Acrobat works is fine.

Further I don't think simply converting char's would help since the converter
would have to recognize between PC and Mac char sets, which overlap.

- Antti Rauramo, WWW- ja tietokanta-asiantuntija, Edita Verkkoviestintń
-, +358-9-8501 4004 (mobile)

