Re: [htdig] PDF indexing problem: Deleted, no excerpt


Subject: Re: [htdig] PDF indexing problem: Deleted, no excerpt
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Aug 09 2000 - 13:30:57 PDT


On Wed, 9 Aug 2000, Mike Gardner wrote:

> HTDIG -v lists the PDF files & their size OK (ie looks as though
> indexing) however I don't see the '+--+--**' that you get for HTML
> files - is this a problem?

No. The +/-/* marks are indications of links in HTML files.

> So I assume that theres no indexable text as the PDF parsing failed
> (even though there were no error messages).

Some PDF files look like text, but were created by some program that just
made graphics. I'd certainly check the PS output that you mentioned for
text.

> Or should I just install xpdf and try that?

This is the recommended way to index PDF files, though certainly if the
PDF if graphics and doesn't store text, there's not much you can do with
them.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Aug 09 2000 - 03:30:54 PDT