Re: [htdig] retrieved pdf file only 100000 chars?


Torsten Neuer (tneuer@inwise.de)
Sat, 17 Jul 1999 17:39:59 +0200


According to Darrell Berry:
>i'm running 3.1.1. on RH5.2...
>
>i have a strange problem with one PDF file, which won't index...ps2ascii
>generates a bunch of errors to do with file position errors...the file
>is about 1.7Mb big...a bit of debugging code shows that the temp file
>being pulled back by htdig is *exactly* 100000 bytes, which seems a bit
>suspicious...and explains the file psition error, presumably...as far as
>i can see, the temp file IS the first 100k of the actual file, but i
>don't understand why it hasnt pulled the whole thing back...
>
>100k is curiously the limit i've set in max_head_length, butcan't see
>how the one should affect the other, and besides changining
>max_head_length doesnt change the size of the retrieved file...
>
>the server its connecting to is running apache 1.3.4, and a manual get
>of the file via HTTP returns a correctly size file...
>
>whats going on here? feature or bug? if the former, is there anyway i
>can change the temp file size, and/or make gs happy?

Just a configuration issue. ht://Dig limits document sizes by default
to exactly that number of characters. Check
     http://www.htdig.org/attrs.html#max_doc_size
and change your ht://Dig configuration file according to your needs.

hth,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sat Jul 17 1999 - 08:01:01 PDT