Re: [htdig] Using pdftotext to index PDF documents


Geoff Hutchison (ghutchis@wso.williams.edu)
Thu, 25 Feb 1999 15:43:30 -0500 (EST)


On Thu, 25 Feb 1999, Gilles Detillieux wrote:

> htdig/Plaintext.cc. (Which raises the question: "why can't an external
> parser just pass plain text or HTML to htdig for further parsing?")

This is the idea behind the TODO item called "External Decoders." The
decoder would perform some sort of translation and pass it back to
ht://Dig. This could involve compression, translation to text or HTML, or
even something fancy like translation to a foreign language (or charset)!

I think to make this idea as elegant as possible, we'd want to add some
sort of MIME detection. That way someone could write a generic
decompression decoder (like passing it through gzip) and ht://Dig would
figure out the result is an HTML file or whatnot. Of course, the MIME
detection could simply be a function to look up the extension in a
mime.types file. :-)

Cheers,
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:13 PST