[htdig] Re: external decoders

Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 25 Feb 1999 16:18:44 -0600 (CST)

According to Geoff Hutchison:
> On Thu, 25 Feb 1999, Gilles Detillieux wrote:
> > htdig/Plaintext.cc. (Which raises the question: "why can't an external
> > parser just pass plain text or HTML to htdig for further parsing?")
> This is the idea behind the TODO item called "External Decoders." The
> decoder would perform some sort of translation and pass it back to
> ht://Dig. This could involve compression, translation to text or HTML, or
> even something fancy like translation to a foreign language (or charset)!

Yes, I realise that's the feature I want. I just wish it had been designed
into the whole external parser support from the start. I've given it some
thought, but I still don't have an easy, straighforward way of putting it

> I think to make this idea as elegant as possible, we'd want to add some
> sort of MIME detection. That way someone could write a generic
> decompression decoder (like passing it through gzip) and ht://Dig would
> figure out the result is an HTML file or whatnot. Of course, the MIME
> detection could simply be a function to look up the extension in a
> mime.types file. :-)

Hmm. What if the external decoder doesn't give you a file suffix, e.g.
if it runs through a pipe. The alternative would be to have the external
decoder put out a Content-Type header, but that would mean setting up
wrapper scripts around the decoder utilities, and offloading the file type
determination to these scripts. Not ideal either. A third alternative
would be to determine file type by the first part of the file, the way
the "file" utility does it. If you only need to recognise plain text
and HTML, it's not that hard, but if you also need to recognise stuff
you'll pass off to any external parser, that may be tricky.

