Subject: Re: [htdig] Indexing PDF Files
From: Gilles Detillieux (email@example.com)
Date: Wed Nov 01 2000 - 14:40:38 PST
If that still doesn't solve the problem, try running conv_doc.pl (or even
pdftotext) directly on some of your problem PDF files. I suspect that
these files contain no indexable text, but only images, which is a common
problem with some PDFs.
You also didn't mention how you installed htdig on your Red Hat 6.2 system.
If you installed from RPM, make sure you used the correct one, i.e. the
According to firstname.lastname@example.org:
> Use conv_doc.pl instead of parse_doc
> get it from http://www.htdig.org/files/contrib/parsers/conv_doc.pl.gz
> gunzip it and move it to /usr/local/bin
> get xpdf from ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.91.tgz
> get ps2ascii from your freetype or ghostscript installation
> put this in your conf/htdig.conf
> application/msword->text/html /usr/local/bin/conv_doc.pl \
> application/postscript->text/html /usr/local/bin/conv_doc.pl \
> application/pdf->text/html /usr/local/bin/conv_doc.pl
> On Wed, 1 Nov 2000, Roy Stephane wrote:
> > I have problems indexing PDF Files. I have already considered the FAQ 4.9
> > and 5.2. So all my path are OK and the MAX_DOC_SIZE parameter is greater
> > than my bigger PDF file. I am working with the external parser "
> > parse_doc.pl ".
> > When I perform rundig in verbose mode, I find that htdig recognise all my
> > PDF files, it shows theire size. After that, when htmerge find a PDF, it say
> > that there is no excerpt, so the file (temporary file) is deleted.
> > I tried to find the parameters that are used to call htdig form rundig.
> > Since an output command on each variables shows nothing on screen, I asume
> > that all the parameters used are having null value.
> > I am using RedHat 6.2, an Appache 1.3
> > Thanks for your help!
> > Stéphane Roy
> > email@example.com <mailto:firstname.lastname@example.org>
> > (450) 542-5906
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Wed Nov 01 2000 - 14:47:01 PST