Re: [htdig] pdf parser: No error;) Search: No results;(


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 24 Feb 1999 12:32:15 -0600 (CST)


According to Geoff Hutchison:
> On Tue, 23 Feb 1999, Gilles Detillieux wrote:
>
> > Back in the summer, there was also some discussion about using ghostscript
> > 5.10 as a PDF to PS or PDF to text converter. Has anyone had any success
> > using this as a means of indexing PDF files with htdig? I wouldn't mind
> > seeing a sample of PostScript generated by gs 5.10's pdf2ps utility, which
> > I don't have handy right here.
>
> Here's the converted test.pdf file, using gs 5.50. I tried a test run with
> it, but it didn't seem to work.

Not surprising! I couldn't find any inteligible text strings in
the file. The very short prologue in the file defines procedures
for drawing commands and image handling, but no text string handling.
The pages themselves are made up of calls to the drawing procedures,
and ASCII85 encoded images. Is there an option on the pdf2ps command
to get text as text, rather than graphics, in the PostScript output?
If not, then I think we can quickly rule out ghostscript's pdf2ps as a
potential pdf_parser.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST