Re: [htdig] File formats supported


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 24 Feb 1999 14:25:00 -0600 (CST)


According to Ismael Olea:
> I've look for the file formats supported in htdig but I can't found
> anything in the web.
>
> Can somebody give me a good url please?

http://www.htdig.org/

:-)

Seriously, the main page at that URL does mention it. If you scroll down
to the Features section, it says:

    - Searching of HTML and text files
        Both HTML documents and plain text files can be searched.
        Searching of other file types will be supported in future versions.

That's not quite the whole story, though. There is some support for
PDF documents right now, if you have acroread (Adobe Acrobat Reader) on
your system. Also, with external parsers, you can index a whole lot more.

The parse_doc.pl script in ht://Dig 3.1.1's contrib directory can handle
MS Word documents (if you have catdoc installed) and PostScript (if you
have Ghostscript 3.33 or later installed). It can be extended to handle
other document to text converters, including pdftotext, for people who
don't have acroread on their system. (I just posted a patch for that.)

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST