Re: [htdig] PDF & ISO-Latin chars


Antti Rauramo (antti.rauramo@edita.fi)
Thu, 12 Aug 1999 14:50:50 +0300


peter karlsson wrote:

> > Anyone out there indexing pdf-files with ISO-Latin characters in them
> > (ń÷ň─Í┼ mainly)? Seems that htdig doesn't understand the meaning of the
> > special characters, and shows them w/o conversion;
>
> I don't believe that htdig *care* what character set the external parsers
> feed it with, but assumes that it is the one as specified in the current
> locale. You would probably want to hack the program you're using to parse
> PDF files to correctly convert its output to the local character set.

That's true; these weird char's show up on excerpt and can be searched for.

But I'm not using anything special to parse PDF's; just Adobe Acrobat and
then indexing with htdig. That's why I'm asking here. And the PS created by
Acrobat works is fine.

Further I don't think simply converting char's would help since the converter
would have to recognize between PC and Mac char sets, which overlap.

--
- Antti Rauramo, WWW- ja tietokanta-asiantuntija, Edita Verkkoviestintń
- antti.rauramo@edita.fi, +358-9-8501 4004 (mobile)

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Aug 12 1999 - 04:58:39 PDT