Re: htdig: ht3.1.0b1 and PDF

Brian Kariger (
Mon, 14 Sep 1998 22:14:52 -0700

On 9/14/98 20:40, Geoff Hutchison wrote:

>> - convert the pdf to ps and use the Postscript module to
>> parse it (looking at the way the modules work, I don't
>> know if this is possible, I haven't look at it that much
>> though)
>> - convert the pdf to text and parse the text
>> - improve the parsing capability by stealing code from
>> the Postscript module
>Number two is clearly the easiest. :-)
>I'd look at the third route first, and if it seems too hard, then I'd back
>up to the second approach.
>In addition to the xpdf "pdftops" program, I'd also look at output from
>ghostscript's "pdf2ps" program. I expect they would be similar, but both
>have been mentioned as alternatives to acroread.

Another possibility is pj, a free Java class library for parsing,
manipulating, and creating Adobe PDF files, available from

Brian Kariger
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:48 PST