[htdig] parsing PDF with NT


Subject: [htdig] parsing PDF with NT
From: Stéphane Baudet (sbaudet@araxe.fr)
Date: Mon Feb 28 2000 - 05:20:23 PST


Hi all,
I successfully compiled HTDig 3.1.4 with Cygwin-B20.1 under Windows NT 4,
and it works great for simple HTML files. But I need to index PDF files and
Adobe Acroread doesn't provide any parsing function under NT. I also tried
xPdf package but maybe there is something I didn't understand about the
configuration file of HTDIG.
I put the following line in htdig.conf :

external_parsers: application/pdf->plain/text
/opt/www/htdig/bin/pdftotext.exe

I also tried with Aladdin Ghostscript 6.0 and :

pdf_parser: /opt/www/htdig/bin/pdf2ps.bat

where pdf2ps.bat is the script provided with Ghostscript.
But nothing works ! I'd really like to use xpdf, but there is always a
syntax error about the PDF input file which is in /tmp, like htdig didn't
get it correctly and broke it !
So, if anybody already had success in indexing PDF under NT, please tell me
how !!
Thank you !

Stephane Baudet.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 29 2000 - 08:08:49 PST