Re: [htdig] PDF indexing problem


Subject: Re: [htdig] PDF indexing problem
From: David Robley (huntsman@www.nisu.flinders.edu.au)
Date: Mon Aug 07 2000 - 18:21:10 PDT


On 7 Aug, Justin Hopkins wrote:
> Hello,
>
> I'm trying to index a dozen or so pdf files on my intranet,
> and both parse_doc.pl w/xpd and acrobat (3 and 4) choke
> on the .pdf files.
>
> When acroread chokes, it gives me several of these
> sorts of errors:
>
> PDF::parse: cannot open acroread output from
> http://omniweb/resmis/docs/PMSs/lib
> ica/userguide/LTCONFIG.pdf
>
> When parse_doc.pl chokes, it gives several:
> sh: /usr/local/bin/parse_doc.pl: No such file or directory
>
> The URLs are valid and the files do exist. The PDFs open
> fine in both IE and separately under acroread 3. I've
> checked and rechecked the variables inside parse_doc.pl
> to make sure they point to the correct translators.
> All the files have appropriate execute and read permissions.
>
> When I tell htdig to use acrobat as the parser, this
> is what the relevant htdig.conf line looks like:
>
> pdf_parser: /usr/local/Acrobat3/bin/acroread -toPostScript -pairs
>
> When I tell htdig to use parse_doc.pl as the parser, this
> is what the relevant htdig.conf line looks like:
>
> external_parsers: "application/pdf" "/usr/local/bin/parse_doc.pl"
>
> (Naturally I comment out one or the other depending on
> what is running)
>
> Any thoughts as to where I should look/what could be the problem?
> Thanks,
> Justin Hopkins

Check that the size of your PDF files is less than the value you have
set for max_doc_size in your configuration file.

Cheers

-- 
David Robley                        | WEBMASTER & Mail List Admin
RESEARCH CENTRE FOR INJURY STUDIES  | http://www.nisu.flinders.edu.au/
AusEinet                            | http://auseinet.flinders.edu.au/
            Flinders University, ADELAIDE, SOUTH AUSTRALIA

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Aug 07 2000 - 08:21:15 PDT