RE: [htdig] Parsing PDF files.


Subject: RE: [htdig] Parsing PDF files.
From: Wayne Fool (wfool@ProgressLighting.com)
Date: Fri Jun 16 2000 - 05:15:15 PDT


> At 1:31 PM -0400 6/15/00, Wayne Fool wrote:
>>I have tried to use parse_doc.pl, conv_doc.pl, and doc2html.pl,
all of these
>>give me 14 consecutive ":=command not found" error messages
>>a "syntax error near unexpected token '( )' " error messages then
finally a
>>message stating "line 83: 'parts = ( );" This is an example of
the error
>>messages I get with all of the above scripts when I run them
manually. I
>>have checked the location of ps2ascii and pdftotext files in the
script and
>>they are correct. The script just shuts down when run with rundig
-vvv

>What version of Perl are you using? What shell do you use?

My perl version is 5.00503-10. Right now I am using bash. I also failed to
mention that I am using RH 6.2.

> >It looks like it is reading the title, is there a way to index those
> words
> >along with 5095 lines of text. I don't get a file returned from the
> search
> >when I search on any of the words in the file.
>
>You said htmerge discards these files. What does it say? (Try
htmerge
>-vv or more verbosity.)

This is one message that I get: htmerge: Discarding 98008fpdf in doc #795

This is another:
Deleted, no excerpt: 37/http://labweb1/pdf/2000001.pdf

(I get around 400 of these messages with only the number prior to /http:...
changing. I didn't include it all because it is very long. I have text
files of the entire rundig -vvv run and my htdig.conf if it would help.
Thanks again for your help.

Wayne

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Jun 16 2000 - 03:07:09 PDT