[htdig] No Excerpt Error


Subject: [htdig] No Excerpt Error
From: Paul Watters (paul@uac.edu.au)
Date: Thu Jul 13 2000 - 21:15:27 PDT


Hi All,

I'm trying to index a set of PDF files using htdig. I've successfully
indexed other PDF files using the same installation, but we now have a new
person doing our PDF's, and they don't seem to be working. We are using
acroread for parsing.

If I execute:
rundig -vvv

I see a message like the following for each of the PDF files:

Header line: HTTP/1.1 200 OK
Header line: Date: Fri, 14 Jul 2000 04:00:36 GMT
Header line: Server: Apache/1.3.12 (Unix)
Header line: Last-Modified: Fri, 02 Jun 2000 04:38:16 GMT
Translated Fri, 02 Jun 2000 04:38:16 GMT to 02 Jun 2000 04:38:16 (100)
And converted to Fri, 02 Jun 2000 04:38:16
Header line: ETag: "129d1-4f24-39373a38"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 20260
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 3876 from document
Read a total of 20260 bytes
PDF::setContents(20260 bytes)
PDF::parse(http://tango.uac.edu.au/htdig/course/mq/300114.pdf)

But, later on, I see the following:

Deleted, no
excerpt: 109/http://tango.uac.edu.au/htdig/course/mq/i/300114.pdf

None of my files are actually being indexed. Does anyone have any
suggestions?

- The PDF's are not excluded in robots.txt
- The server_max_docs parameter is not in use

Cheers,
Paul

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Jul 13 2000 - 18:25:39 PDT