Re: htdig: (Not) translating entities

Geoff Hutchison (
Mon, 11 Jan 1999 20:56:06 -0500 (EST)

On Mon, 11 Jan 1999, Marjolein Katsma wrote:

> Some digging revealed tha both &lt; and &gt; are translated, and then '<'
> is converted to a space... not what I needed.
> For pages with code samples of such laguages (HTML and other tag-based
> languages) the automatic translation of such entities actually gets in the
> way - so I made it configurable. Also useful for pages/sites with
> mathematical formulae which should be recognizable in the excerpts.

This was a kludge to side-step a nasty bug in the HTML parser. If we
didn't remove the '<' it would call it the beginning of a tag and try to
parse the tag. Not nice either.

While your patch is nice, it also side-steps the issue in the HTML parser.
One of these days someone needs to go back and figure out an optimal
approach to its tasks--translate SGML entities, operate on tags, and form
the excerpts. Right now we're doing them in that order, but this clearly
causes problems.

You also noted that there were some SGML equivalents not present in the
current file. I'll gladly accept a patch for that (or if you're too busy,
a URL to an appropriate reference). ;-)

-Geoff Hutchison
Williams Students Online

