Re: [htdig] WordPerfect parser?


Subject: Re: [htdig] WordPerfect parser?
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Nov 25 1999 - 11:20:08 PST


According to David Adams:
> I have downloaded the parse_doc.pl script, and the xpdf and catdoc
> utilities, and I am now using them to extend our search index to include
> Word and PDF files. It all works well and with a bit of alteration to
> the Perl script does exactly what I want. My thanks to the developers!
>
> We also have a need to index WordPerfect documents, including those
> produced by WP 6.1 and later. Can anyone recommend a utility that will
> run under IRIX 6.5 ?

I haven't come across any open source/freeware WP to text converters.
The reason I put the WP hooks in there originally was because some sites
had .doc files that were WP rather than Word documents, and the WP documents
caused catdoc to blow chunks. Same story for .doc files in RTF format.
I then realised there are all sort of .doc files that aren't MS-Word,
so I put in explicit checks for MS-Word magic numbers rather than using
catdoc by default, but still kept the WP and RTF hooks in by way of
example.

If WordPerfect for UNIX is available for IRIX, and it contains the cvt
utility as WP for Linux does, you could write a script that uses that,
or adapt the parse_doc.pl script to use it directly. Its usage is:

/usr/local/wplinux/shbin10/cvt -l file.wpd file.txt asci > /dev/null

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You'll receive a message confirming the unsubscription.



This archive was generated by hypermail 2b25 : Thu Nov 25 1999 - 11:32:06 PST