Re: [htdig] PDF parsing in htdig/PDF.cc


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 24 Feb 1999 19:16:21 -0500


>However, what I'd really like to see is the output of the latest's
>Ghostscript's PDF to text converter, as well as its PDF to PS converter.
>Could you send me these, please? (No need to post to the list, though.)
>I know I should probably just upgrade to the latest version myself, but
>I've just got too many other things in the queue right now.

I can't seem to find a pdf2ascii converter as part of Ghostscript. I
already sent you the results of gd5.50 on one PDF, but I'd be glad to send
you as many as you want. I haven't hacked the PDF.cc source, but I can't
see anything too useful in the files I've run through pdf2ps.

>Can't hurt! However, if they do come up with something, you can be
>pretty sure it will be a binary-only release, and not open source.
>That leaves some htdig users out in the cold, so if we can also work
>out some open source alternatives, then everyone's happy.

It would certainly help if there was a reasonable, GPL-compatible PDF
library. If anyone has heard of one, that would be great. Certainly we'd
like to keep much of the current PDF parser, but it would be nice to
directly parse the PDF files as well. Last fall someone asked about reading
meta info from the PDF files, but they could only refer me to a Perl
library.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:13 PST