sxw2plain --------- This simple script is used to convert OpenOffice1.0-Writer-Documents (.sxw) to plain text for parsing with htdig. What do I need? --------------- Unzip Note: gunzip does not handle swx-files correctly! Install ------- Do the following: Add to your mime.types (/opt/www/conf) the line application/vnd.sun.xml.writer sxw I had to do this to let htdig point to the right parser. In your htdig.conf (/opt/www/conf) add following: mime_types: /opt/www/conf/mime.types Of course, you can also choose the mime.types-file of your Web-Server. and in your external_parsers - section add: application/vnd.sun.xml.writer->text/plain /opt/www/bin/sxwtoplain Now, place the sxwtoplain file (it's a simple bash-script) which comes with this package in the right path (/opt/www/bin). It has to be executable of course. The last step is to create the directory ooffice in your /tmp/htdig-directory (/tmp/htdig/ooffice). Of course you can change this in the sxwtoplain-file. How it works ------------ Sxwtoplain just unzip's the sxw-file temporarly to /tmp/htdig/ooffice, removes all XML-tags and then parse it as plain text. Written by David Berger (berger@netmon.ch), NetMon GmbH Zurich, Switzerland Feel free to extend it, but please mail me if you find out any improvements for this script. Thanx to all the people out there working on OS-Software. Great job!