Readme for parse_doc.pl External parser for ht://Dig that parses Word files so they can be indexed. Use "catdoc" program to extract text from Word document. Written by Jesse op den Brouw , and enhanced by a cast of characters, including Gilles Detillieux . Generally better at parsing Word files than htparsedoc and better at not crashing. :-) Extended to handle PostScript and PDF files as well, with the appropriate document to text converters. Note that this script has been largely made obsolete since htdig 3.1.4 was released. It's now much preferable to use doc2html.pl as an external converter.