Re: [htdig] Problem with PDF files....


Subject: Re: [htdig] Problem with PDF files....
From: Elijah Kagan (elijah@netvision.net.il)
Date: Tue Jan 16 2001 - 01:16:10 PST


Gilles,

I greatly appreciate your help! Thanks!

There are two parameters in Apache config file that tell it to add a
charset field by default. They are: AddDefaultCharset and
AddDefaultCharsetName. The first one should be set to off to prevent
Apache from replying with a charset field set after the content type.

After disabling AddDefaultCharset htdig worked as expected.

Thanks again,

Elijah

On Mon, 15 Jan 2001, Gilles Detillieux wrote:

> According to Elijah Kagan:
> > I run htdig 3.1.5.
> > I tried both the Debian package and a compiled one with the same result.
> > I am absolutely sure there is something stupid I forgot to put into the
> > configuration.
> >
> > Attached is the config file.
> >
> > Thanks for your help.
> >
> > Elijah
> >
> >
> > On Fri, 12 Jan 2001, Gilles Detillieux wrote:
> >
> > > According to Elijah Kagan:
> > > > 1. I run htdig with an explicit -c option, so it uses the correct conf
> > > > file.
> > > > 2. I rewrote the external_parsers so it includes only one line...
> > > > 3. ..and it is the first line in the file
> > > >
> > > > Results are the same! It is still looking for an acroread!
> > > >
> > > > Please, help. I am getting desperate...
> > >
> > > Hmm. You're sure you're running version 3.1.5 of htdig, and you
> > > don't have a pre-3.1.4 binary of htdig kicking around that you might be
> > > unknowingly running instead? External converter support was added to the
> > > external_parsers attribute only in version 3.1.4 and above. If you're
> > > sure this isn't the problem either, please send me a copy of your conf
> > > file as it stands now (preferably uuencoded right on your htdig box to
> > > prevent e-mail mangling of it), and I'll have a look and try a test or two.
> > >
> > > Oh, another thing. You mentioned this was on a Debian system. Did you
> > > compile htdig yourself, or did you use a pre-compiled binary? If the
> > > latter, which one?
>
> OK, it took a while, but the light finally came on! If you look up the
> following thread on the mailing list archives:
>
> http://www.htdig.org/mail/2000/09/index.html#75
>
> you'll see that the bug has come up before. I think there's something
> about the Debian configuration for Apache that causes it to add the
> "; charset=..." string to the Content-Type header, which is the source
> of the problem here. At least I strongly suspect it must be the same
> problem, as I can't see anything else that would explain the behaviour
> you're reporting. If you run htdig -vvv -i -c ..., you can then look
> at the header lines returned by your server for the PDF files, and see
> if the Content-Type header does indeed have something on the line after
> the application/pdf string.
>
> Geoff and I made some hacks to ExternalParser.cc in the 3.2.0b3
> development code to address this, but none of this has been backported
> to 3.1.5 yet. I'll see if I can backport some or all of the external
> parser patches to 3.1.5 in the next day or two. In the meantime,
> you can try working around this either by using local_urls, if you're
> running htdig on the same machine as your Apache server, or by using
> the same hack that Klaus used, i.e. add a line like the following to
> your external_parsers definition.
>
> "application/pdf; charset=iso-8859-1->text/html" /usr/share/htdig/conv_doc.pl
>
> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Jan 16 2001 - 01:32:36 PST