Re: [htdig] parsing PDF with NT


Subject: Re: [htdig] parsing PDF with NT
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Mar 01 2000 - 09:00:44 PST


According to =?iso-8859-1?Q?St=E9phane_Baudet?=:
> Well thanks for your reply. I upgraded to 3.1.5, but I still have problems
> parsing PDF files. I found that the temporary files retrieved by HtDig are a
> little bigger than the original PDF files. I managed to keep it and tried to
> open it with Acrobat reader. And actually, pages remain blank, so the file
> should be corrupted.
> For example, I have a PDF which size is 90076 bytes and HtDig retrieves a
> temporary file in /tmp which size is 90386 bytes !!
> Any idea ?

Well, I'm going out on a limb here, because I'm really not familiar with
the Cygwin package, but if it makes a distinction between writing to
binary files vs. text files, adding CRs before LFs on text files, then this
could be the problem here. htdig/ExternalParser.cc creates its temporary
file using:

    FILE *fl = fopen(path, "w");

If this causes the Cygwin library to do CR/LF expansion, you'd need to
change this to avoid that problem, e.g. by using "wb" as the second
argument, if that's what it takes, or somehow setting O_BINARY mode on
the file. Have a look at the Cygwin docs, and please let us know if you
find a fix - we'll try to incorporate a portable form of it in future
releases.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Mar 01 2000 - 09:05:00 PST