Subject: Re: [htdig] parsing PDF with NT
From: Gilles Detillieux (email@example.com)
Date: Wed Mar 01 2000 - 09:00:44 PST
According to =?iso-8859-1?Q?St=E9phane_Baudet?=:
> Well thanks for your reply. I upgraded to 3.1.5, but I still have problems
> parsing PDF files. I found that the temporary files retrieved by HtDig are a
> little bigger than the original PDF files. I managed to keep it and tried to
> open it with Acrobat reader. And actually, pages remain blank, so the file
> should be corrupted.
> For example, I have a PDF which size is 90076 bytes and HtDig retrieves a
> temporary file in /tmp which size is 90386 bytes !!
> Any idea ?
Well, I'm going out on a limb here, because I'm really not familiar with
the Cygwin package, but if it makes a distinction between writing to
binary files vs. text files, adding CRs before LFs on text files, then this
could be the problem here. htdig/ExternalParser.cc creates its temporary
FILE *fl = fopen(path, "w");
If this causes the Cygwin library to do CR/LF expansion, you'd need to
change this to avoid that problem, e.g. by using "wb" as the second
argument, if that's what it takes, or somehow setting O_BINARY mode on
the file. Have a look at the Cygwin docs, and please let us know if you
find a fix - we'll try to incorporate a portable form of it in future
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Mar 01 2000 - 09:05:00 PST