Re: [htdig] Best way to parse PDF?


Geoff Hutchison (ghutchis@wso.williams.edu)
Tue, 15 Jun 1999 09:23:36 -0400 (EDT)


On Tue, 15 Jun 1999, Marian Steinbach wrote:

> Is their a universal way to achieve indexing PDF?

I'll give a fairly short answer, I'm sure others will probably correct me
if I'm wrong.

Yes and no.

Some programs write PDF files as graphics. This, of course, defeats the
whole purpose of the format, but it makes it essentially impossible to
index.

For the vast majority of PDF files, you'll do very well setting an
external parser to parse_doc.pl and using xpdf. There has been quite a bit
of discussion on this point, and I expect a search for xpdf should turn up
a bunch.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Tue Jun 15 1999 - 05:38:41 PDT