Re: [htdig] PDF indexing problem


Subject: Re: [htdig] PDF indexing problem
From: Justin Hopkins (hop@omnihotels.com)
Date: Tue Aug 08 2000 - 05:49:13 PDT


----------
> From: David Robley <huntsman@www.nisu.flinders.edu.au>
> To: hop@omnihotels.com
> Cc: htdig@htdig.org
> Subject: Re: [htdig] PDF indexing problem
> Date: Monday, August 07, 2000 8:21 PM
>
> On 7 Aug, Justin Hopkins wrote:
> > Hello,
> >
> > I'm trying to index a dozen or so pdf files on my intranet,
> > and both parse_doc.pl w/xpd and acrobat (3 and 4) choke
> > on the .pdf files.
> >
> > When acroread chokes, it gives me several of these
> > sorts of errors:
> >
> > PDF::parse: cannot open acroread output from
> > http://omniweb/resmis/docs/PMSs/lib
> > ica/userguide/LTCONFIG.pdf
> >
> > When parse_doc.pl chokes, it gives several:
> > sh: /usr/local/bin/parse_doc.pl: No such file or directory
> >
> > The URLs are valid and the files do exist. The PDFs open
> > fine in both IE and separately under acroread 3. I've
> > checked and rechecked the variables inside parse_doc.pl
> > to make sure they point to the correct translators.
> > All the files have appropriate execute and read permissions.
> >
> > When I tell htdig to use acrobat as the parser, this
> > is what the relevant htdig.conf line looks like:
> >
> > pdf_parser: /usr/local/Acrobat3/bin/acroread -toPostScript -pairs
> >
> > When I tell htdig to use parse_doc.pl as the parser, this
> > is what the relevant htdig.conf line looks like:
> >
> > external_parsers: "application/pdf" "/usr/local/bin/parse_doc.pl"
> >
> > (Naturally I comment out one or the other depending on
> > what is running)
> >
> > Any thoughts as to where I should look/what could be the problem?
> > Thanks,
> > Justin Hopkins
>
> Check that the size of your PDF files is less than the value you have
> set for max_doc_size in your configuration file.
 
It worked! Thanks!
Justin Hopkins

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Aug 07 2000 - 19:49:11 PDT