Re: [htdig] split word with accents

Subject: Re: [htdig] split word with accents
From: Gilles Detillieux (
Date: Mon May 29 2000 - 08:38:29 PDT

According to Andoni Ayala:
> When i trying to parse doc (pdf, wordperfect, etc), i parse it with
>, the script split the accented word in two. but if i parse
> directly the document with de particular parser (ej wp2html, or
> pdftohtml) i view well the accents.

Are you sure it's the script, and not htdig, that's splitting
the words? Do you have your locale set correctly? See

You should probably also use an external converter, such as or
better yet, doc2html, as you'll get better results than with
The doc2html converter also makes it easier to add other conversion

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Mon May 29 2000 - 06:28:12 PDT