Subject: RE: [htdig] Parsing PDF files.
From: Wayne Fool (wfool@ProgressLighting.com)
Date: Fri Jun 16 2000 - 05:15:15 PDT
> At 1:31 PM -0400 6/15/00, Wayne Fool wrote:
>>I have tried to use parse_doc.pl, conv_doc.pl, and doc2html.pl,
all of these
>>give me 14 consecutive ":=command not found" error messages
>>a "syntax error near unexpected token '( )' " error messages then
>>message stating "line 83: 'parts = ( );" This is an example of
>>messages I get with all of the above scripts when I run them
>>have checked the location of ps2ascii and pdftotext files in the
>>they are correct. The script just shuts down when run with rundig
>What version of Perl are you using? What shell do you use?
My perl version is 5.00503-10. Right now I am using bash. I also failed to
mention that I am using RH 6.2.
> >It looks like it is reading the title, is there a way to index those
> >along with 5095 lines of text. I don't get a file returned from the
> >when I search on any of the words in the file.
>You said htmerge discards these files. What does it say? (Try
>-vv or more verbosity.)
This is one message that I get: htmerge: Discarding 98008fpdf in doc #795
This is another:
Deleted, no excerpt: 37/http://labweb1/pdf/2000001.pdf
(I get around 400 of these messages with only the number prior to /http:...
changing. I didn't include it all because it is very long. I have text
files of the entire rundig -vvv run and my htdig.conf if it would help.
Thanks again for your help.
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Jun 16 2000 - 03:07:09 PDT