[htdig] What's the best parser?


Subject: [htdig] What's the best parser?
From: Martin Mielke (martinm@people-com.com)
Date: Tue Oct 17 2000 - 10:13:02 PDT


Hello all,

nowadays I have implemented conv_doc.pl as general parser for PDF,
PostScript and M$ Word documents.
From time to time I get error messages like:

--8<--8<--8<--

Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error (139803): Bad colorspace

--8<--8<--8<--

Even though the 'max_doc_size' is set high enough for all PDFs to be parsed
correctly and the files are safe and sound (users can open/read them without
problems).
Therefore I wonder if this is a parser-dependant issue rather than a
configuration one. Maybe you have better experiences with other parsers
giving best results... I'd like to hear some before
downloading/installing/reconfiguring things here...

Thanks and best regards,

Martin

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Oct 17 2000 - 10:17:25 PDT