Re: xpdf 0.90 announcement (was Re: [htdig] parse_doc.pl slow)


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 12 Aug 1999 12:00:19 -0500 (CDT)


According to Frank Guangxin Liu:
> I just installed and tested the new xpdf 0.90.
> The new pdftotext has an option "-raw" which should be same
> as the old patched -rawdump I guess.
> It also has the deltax fixes included.

Yes, I just installed and tested it here myself. I built it without
t1lib, because I haven't yet figured out how to compile and install t1lib.
I just found a t1lib source RPM, so I'll give that a try next. The -raw
option is an improvement over my -rawdump option, in that the text is
formatted better. (That doesn't really matter for indexing, though.)

> "xpdf" seems to be able to display landscape tables without
> a problem on XFree86 server, but not on my MetroLink X server.
> "pdftotext" still generate huge text file (huge lines for
> the landscape tables), but "pdftotext -raw" can generate
> reasonable sized file (as we found before). The good news
> is the text file DOES have those keywords from the landscape
> tables!!

Hmmm. I tried the new pdftotext on the test.pdf you had given me back
when you ran into the problem with landscape tables, and it's still not
putting out very meaningful text. It's better than before, but it's
still breaking up a whole lot of the words. I highly doubt the absence
of t1lib would make a difference to pdftotext, but I could be wrong.
I'll let you know if I spot a difference. However, you should try your
new pdftotext on the test.pdf file you gave me, and look for what it
puts out for Page 48 and on. You may still find that for your files,
acroread works better.

> I would highly recommend people to upgrade to xpdf-0.90.
> It also supports PDF 1.3 as in the annoucement.

Ditto. With the improvements, plus avoiding the need for patches,
it's the way to go. Even if it doesn't completely solve your problem,
for most other situations it does an excellent job. I'll update the
FAQ and parse_doc.pl comments to include the new version number and
-raw option.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Aug 12 1999 - 10:01:23 PDT