Re: [htdig] Still no luck with indexing PDF's


Subject: Re: [htdig] Still no luck with indexing PDF's
From: Anthony Peacock (a.peacock@chime.ucl.ac.uk)
Date: Mon Feb 07 2000 - 08:39:39 PST


> On Mon Feb 7 10:07:32 2000 Anthony Peacock wrote...
> >
> >
> >Can you try converting a single PDF file by running either Acroread or
> >pdf2text from the command line. And send the results to the list.
> >
> >We first of all need to work out if the failure is due to running through
> >htdig/parse_doc.pl.
>
> Sure. I can use pdf2text. I must admit, that I don't know how to use
> acroread to do the conversion. Could you tell me, so I can try that?

Sorry I stopped using Acroread a long time ago.

OK! So you _are_ able to run pdf2text from the command prompt.

Next we need to try to work out which bit is now failing.

When you get the error message are you running htdig/htmerge or rundig
with -vvv on the command line this will give you lots more information and
may reveal some more clues.

My config file is configured like this:

external_parsers: application/msword /usr/local/bin/parse_doc.pl \
                  application/postscript /usr/local/bin/parse_doc.pl \
                  application/pdf /usr/local/bin/parse_doc.pl

Next can you run parse_doc.pl from the command line, ie:

/usr/local/bin/parse_doc.pl document.pdf

---
Fare Thee Well
Anthony Peacock       
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The social dynamics of the net are a direct consequence of the fact that
nobody has yet developed a Remote Strangulation Protocol." --Larry Wall

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 07 2000 - 08:37:21 PST