Re: Re[2]: Re[2]: [htdig] Accents erratic conduct


Subject: Re: Re[2]: Re[2]: [htdig] Accents erratic conduct
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Jun 05 2000 - 08:20:48 PDT


According to Andoni Ayala:
> El Thu, 1 Jun 2000 10:35:28 -0500 (CDT)
> Gilles Detillieux <grdetil@scrc.umanitoba.ca> escribiste:
> > No. I want to know how you've set the external_parsers attribute in your
> > htdig.conf. The conv_doc.pl output looks fine, and I took your word for
> > that earlier. What I want to know is, if htdig is having problems when
> > conv_doc.pl's output looks fine, how is htdig calling conv_doc.pl (or even
> > if it's calling it at all). If your external_parsers setting is incorrect,
> > that could be part of the problem - it may even cause htdig to fall back on
> > the pdf_parser (acroread) rather than using an external converter.
> >
>
> Ok,
>
>
> external_parsers: application/msword /opt/htdig/bin/parse_doc.pl \
> application/postscript /opt/htdig/bin/parse_doc.pl \
> application/pdf /opt/htdig/bin/parse_doc.pl \
> application/rtf->text/html /opt/htdig/bin/doc2html.pl \
> text/rtf->text/html /opt/htdig/bin/doc2html.pl \
> application/Wordperfect5.1->text/html /opt/htdig/bin/doc2html

That's the problem right there! You're not using conv_doc.pld or
doc2html.pl to deal with PDFs. You're using the parse_doc.pl script.
It does seem to have problems with accents, and I'm not sure exactly why.
I'd guess that the most recent changes to it are locale-sensitive,
so you might need to set the locale for it.

However, if you have doc2html.pl installed and configured correctly,
you're far better off using it for all the document types you're currently
passing to parse_doc.pl, and not bothering with parse_doc.pl at all.
Use something like the following, and make sure you have all the paths
configured correctly in doc2html.

external_parsers: application/msword->text/html /opt/htdig/bin/doc2html.pl \
                  application/postscript->text/html /opt/htdig/bin/doc2html.pl \
                  application/pdf->text/html /opt/htdig/bin/doc2html.pl \
                  application/rtf->text/html /opt/htdig/bin/doc2html.pl \
                  text/rtf->text/html /opt/htdig/bin/doc2html.pl \
                  application/Wordperfect5.1->text/html /opt/htdig/bin/doc2html.pl

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Jun 05 2000 - 06:11:33 PDT