Re: [htdig] Indexing PDF Files


Subject: Re: [htdig] Indexing PDF Files
From: creep@datacreep.net
Date: Wed Nov 01 2000 - 13:39:03 PST


Use conv_doc.pl instead of parse_doc

get it from http://www.htdig.org/files/contrib/parsers/conv_doc.pl.gz
gunzip it and move it to /usr/local/bin

get xpdf from ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.91.tgz

get ps2ascii from your freetype or ghostscript installation

put this in your conf/htdig.conf
external_parsers:
            application/msword->text/html /usr/local/bin/conv_doc.pl \
            application/postscript->text/html /usr/local/bin/conv_doc.pl \
            application/pdf->text/html /usr/local/bin/conv_doc.pl

On Wed, 1 Nov 2000, Roy Stephane wrote:

> I have problems indexing PDF Files. I have already considered the FAQ 4.9
> and 5.2. So all my path are OK and the MAX_DOC_SIZE parameter is greater
> than my bigger PDF file. I am working with the external parser "
> parse_doc.pl ".
>
> When I perform rundig in verbose mode, I find that htdig recognise all my
> PDF files, it shows theire size. After that, when htmerge find a PDF, it say
> that there is no excerpt, so the file (temporary file) is deleted.
>
> I tried to find the parameters that are used to call htdig form rundig.
> Since an output command on each variables shows nothing on screen, I asume
> that all the parameters used are having null value.
>
> I am using RedHat 6.2, an Appache 1.3
>
> Thanks for your help!
>
> Stéphane Roy
> sroy@oerlikon.ca <mailto:sroy@oerlikon.ca>
> (450) 542-5906
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-unsubscribe@htdig.org
> You will receive a message to confirm this.
> List archives: <http://www.htdig.org/mail/menu.html>
> FAQ: <http://www.htdig.org/FAQ.html>
>
>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Nov 01 2000 - 13:44:16 PST