Subject: [htdig] PDF indexing problem
From: Justin Hopkins (hop@omnihotels.com)
Date: Mon Aug 07 2000 - 14:23:14 PDT


I'm trying to index a dozen or so pdf files on my intranet,
and both parse_doc.pl w/xpd and acrobat (3 and 4) choke
on the .pdf files.

When acroread chokes, it gives me several of these
sorts of errors:

PDF::parse: cannot open acroread output from

When parse_doc.pl chokes, it gives several:
sh: /usr/local/bin/parse_doc.pl: No such file or directory

The URLs are valid and the files do exist. The PDFs open
fine in both IE and separately under acroread 3. I've
checked and rechecked the variables inside parse_doc.pl
to make sure they point to the correct translators.
All the files have appropriate execute and read permissions.

When I tell htdig to use acrobat as the parser, this
is what the relevant htdig.conf line looks like:

pdf_parser: /usr/local/Acrobat3/bin/acroread -toPostScript -pairs

When I tell htdig to use parse_doc.pl as the parser, this
is what the relevant htdig.conf line looks like:

external_parsers: "application/pdf" "/usr/local/bin/parse_doc.pl"

(Naturally I comment out one or the other depending on
what is running)

Any thoughts as to where I should look/what could be the problem?
Justin Hopkins

