[htdig3-dev] Re: PDF version 1.3 -- xpdf supports version 1.2


Joe R. Jah (jjah@cloud.ccsf.cc.ca.us)
Fri, 16 Apr 1999 11:12:49 -0700 (PDT)


On Fri, 16 Apr 1999, Gilles Detillieux wrote:

> Date: Fri, 16 Apr 1999 10:26:36 -0500 (CDT)
> From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
> To: "Derek B. Noonburg" <derekn@foolabs.com>
> Cc: jjah@cloud.ccsf.cc.ca.us, grdetil@scrc.umanitoba.ca,
> htdig3-dev@htdig.org
> Subject: Re: PDF version 1.3 -- xpdf supports version 1.2
>
> D'oh! The "file is damaged" error should have tweaked my memory. It's
> come up before, but I got thrown off track by the version number issue.
>
> The max_doc_size attribute tells htdig what it should use as an upper
> limit on documents it fetches. Anything above that gets truncated!
> This works OK for HTML documents, but it makes PDFs unusable.
> The default max_doc_size is 100000 bytes. When indexing PDFs, this
> should be increased by a lot, so that it's big enough to handle the
> largest PDF you will index. If you can't afford to make it large enough,
> because of memory constraints, you need to explicitly exclude larger
> PDFs from indexing, e.g. by listing them with Disallow records in your
> robots.txt file.

Thanks a bunch Gilles and Derek. I increased the max_doc_size from 600 K
to 1.6 M and the rest of the error message disappeared; only one line per
file still is reported, which is innocuous I believe:
______________________________________________________________________________
Error (1024): PDF version 1.3 -- xpdf supports version 1.2 (continuing anyway)
______________________________________________________________________________

Best regards and looking forward to XPDF with 1.3 support;)

Joe

     _/ _/_/_/ _/ ____________ __o
     _/ _/ _/ _/ ______________ _-\<,_
 _/ _/ _/_/_/ _/ _/ ......(_)/ (_)
  _/_/ oe _/ _/. _/_/ ah jjah@cloud.ccsf.cc.ca.us

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Apr 16 1999 - 11:20:44 PDT