[htdig] PDF parsing in htdig/PDF.cc


Patrick Dugal (patrick.dugal@nrc.ca)
Wed, 24 Feb 1999 12:17:46 -0500


Has anyone had any problems with xpdf's pdftotext (with decryption patch)? Maybe
the PDF.cc could solely rely on pdftotext instead of acroread and it's internal
parsing? I have tested pdftotext with many pdf's and it seems to work so far on
all the ones PDF.cc failed on.

According to the xpdf README, many documents from Adobe were consulted when
pdftotext was written. I think that the value of making PDF.cc use pdftotext would
represent a significant improvement.

Has anyone tried to tweak and test PDF.cc so that it relies solely on pdftotext?
If not, I will and let the list know if there is any significant improvement.

Does anyone know what is the best pdf to text parser out there? How about the best
ps to text parser?

Pat :)

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST