Re: AW: [htdig] irrelevant pages in search


Subject: Re: AW: [htdig] irrelevant pages in search
From: David J. Adams (D.J.Adams@soton.ac.uk)
Date: Fri Nov 19 1999 - 06:23:29 PST


On Fri, 19 Nov 1999 13:50:10 +0100 Hartmut Steffin
<h.steffin@abi-behoerden.de> wrote:

> I have the same problem on our intranet site. it reaches a level of
> unreliability that the whole search is useless. there must be a principle
> error. the only errors in the log i have are about not being able to index
> pdf-files:
>
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Expected a dict object.
> /tmp/htdig12988.pdf: This document requires a password.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: Could not repair file.
> /tmp/htdig12988.pdf: This document requires a password.
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
> PDF::parse: cannot open acroread output
>
> I don't understand what the problem with these files is. They work perfectly
> from the browser.
> Is there a connection between error in pdf-files and messing up the
> database?
>
> regards
> Hardy
>

PDF files are generally large, probably larger than the
max_document_size:
you have set in the htdig configuration file. This results
in only part of the file being downloaded from the server.

Only if the _entire_ file is presented to the Acrobat Reader
can it extract the text.

----------------------
David J. Adams
D.J.Adams@soton.ac.uk

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You'll receive a message confirming the unsubscription.



This archive was generated by hypermail 2b25 : Fri Nov 19 1999 - 06:45:24 PST