Re: htdig: ht3.1.0b1 and PDF


Brian Kariger (bpk@nothing.com)
Mon, 14 Sep 1998 22:14:52 -0700


On 9/14/98 20:40, Geoff Hutchison wrote:

>> - convert the pdf to ps and use the Postscript module to
>> parse it (looking at the way the modules work, I don't
>> know if this is possible, I haven't look at it that much
>> though)
>> - convert the pdf to text and parse the text
>> - improve the parsing capability by stealing code from
>> the Postscript module
>
>Number two is clearly the easiest. :-)
>I'd look at the third route first, and if it seems too hard, then I'd back
>up to the second approach.
>
>In addition to the xpdf "pdftops" program, I'd also look at output from
>ghostscript's "pdf2ps" program. I expect they would be similar, but both
>have been mentioned as alternatives to acroread.

Another possibility is pj, a free Java class library for parsing,
manipulating, and creating Adobe PDF files, available from
<http://www.etymon.com/pj/>.

--
Brian Kariger
bpk@nothing.com
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:48 PST