Re: xpdf 0.90 announcement (was Re: [htdig] slow)

Gilles Detillieux (
Thu, 12 Aug 1999 14:22:10 -0500 (CDT)

According to Frank Guangxin Liu:
> Here is how I tested it:
> pdftotext.old -rawdump test.pdf
> grep F_Table test.txt
> can't find any match. (F_Table is a word in the landscape table
> on Page 54 of 72).
> -raw test.pdf
> grep F_Table test.txt
> found the match!!
> I understand the "test.txt" generated from the new pdftotext
> still looks funny (unformated) for those landscape tables
> (Page 48 and beyond), but at least it has all the words in
> there which is all htdig cares.

But not all the words are intact. Here's an example of pdftotext output
from the PDF you gave me:

mpliance wit
h QS
P 1-
02, Pro
tection of Pro
prietary Interests,
 is re
quired. Info
rmation contained with
in this d
ocument or generated as a result thereof is no
t to be disclosed to third partie

Most of the words are intact, but a lot of them wrap onto another line,
so htdig treats the two parts as separate words. Yes, it's a lot better
than what you'd get with pdftotext 0.80, with my rawdump patch, but is it
as good as what you'd get from htdig's parsing of acroread's PostScript

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to containing the single word unsubscribe in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Thu Aug 12 1999 - 12:23:00 PDT