[htdig] parse_doc.pl split word with accents


Subject: [htdig] parse_doc.pl split word with accents
From: Andoni Ayala (aayala@virtualcom.es)
Date: Mon May 29 2000 - 04:04:15 PDT


Hi.

When i trying to parse doc (pdf, wordperfect, etc), i parse it with
parse_doc.pl, the script split the accented word in two. but if i parse
directly the document with de particular parser (ej wp2html, or
pdftohtml) i view well the accents.

Thanks

Pd: Excuse me for my poor english.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon May 29 2000 - 01:55:58 PDT