Re: [htdig] Best way to parse PDF?

Geoff Hutchison (
Tue, 15 Jun 1999 09:23:36 -0400 (EDT)

On Tue, 15 Jun 1999, Marian Steinbach wrote:

> Is their a universal way to achieve indexing PDF?

I'll give a fairly short answer, I'm sure others will probably correct me
if I'm wrong.

Yes and no.

Some programs write PDF files as graphics. This, of course, defeats the
whole purpose of the format, but it makes it essentially impossible to

For the vast majority of PDF files, you'll do very well setting an
external parser to and using xpdf. There has been quite a bit
of discussion on this point, and I expect a search for xpdf should turn up
a bunch.

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Tue Jun 15 1999 - 05:38:41 PDT