Subject: Re: [htdig] External Parser/Converter Ignored?
From: Gilles Detillieux (email@example.com)
Date: Thu Dec 16 1999 - 10:30:05 PST
According to Jochen.Munz@sued-data.de:
> after reading FAQ etc., I still can't get PDF-indexing to work.
> I use the "parse_doc.pl" parser, located in /opt/htdig/bin. The perl script is
> correctly configured and "pdftotext/pdfinfo" are in place.
> My config file looks like this:
> external_parsers: application/pdf /opt/htdig/bin/parse_doc.pl
> max_doc_size: 2000000 #just to be sure
> When I run "rundig -vvv" I get the following:
> So the PDF is served, and read in completely. But the external parser is not
> triggered. I even added a simple "touch /var/tmp/dummyfile" to the beginning of
> the perl-script. Started from the shell, the file is touched - but not when
> htdig runs.
> This leaves me with a not-indexed PDF:
> (htmerge) Deleted, no excerpt: 2/http://myserver/pdf/online.pdf
> If I remove the "external_parsers" line the internal PDF-parser is triggered, so
> the content-type "application/pdf" seems to be recognized.
> Any help would be greatly appreciated.
Just a hunch, but what is your TMPDIR environment variable set to when
you run htdig? If you don't have write access to that directory, htdig
won't be able to create the temporary file it uses to pass the document
to the parser, and, believe it or not, if that happens it silently leaves
the document without parsing it.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Dec 16 1999 - 10:44:20 PST