Re: [htdig] Re: htdig: Comments

Gilles Detillieux (
Tue, 2 Mar 1999 16:50:48 -0600 (CST)

According to Matt Edwards:
> HtDig 3.1.1 isn't parsing (slightly non-standard) comments correctly.
> Extra dashes in the comment can confuse the current parser into
> ignoring a lot of content. For example <!--comment----> is seen as
> an uncompleted comment beginning.
> It seems a lot of web content doesn't strictly adhere to the
> "standard" for comments, so we should be a little careful here.
> For example both IE and Netscape require "<!--" comments to end
> with a "-->" without whitespace between the "--" and the ">".
> Perhaps htDig would be better off doing the same.
> According to Marjolein Katsma:
> > Starting on my next project, I had to dig in, and found th
> > efollowing code to filter out comments:
> According to Gilles Detilleux

Actually, this was more text quoted from Marjolein...
> > While this will catch *most* comments, it will see some perfectly legal
> > comments as illegal and skip the rest of the page. The best definition
> > of comments is found in HTML 2.0 (unchanged in the actual DTD in later
> > versions, but never properly explained any more...):
> >
> > "To include comments in an HTML document, use a comment declaration. A
> > comment declaration consists of `<!' followed by zero or more comments
> > followed by `>'. Each comment starts with `--' and includes all text up
> > to and including the next occurrence of `--'. In a comment declaration,
> > white space is allowed after each comment, but not before the first
> > comment. The entire comment declaration is ignored."
> >

Marjolein brought up this issue in January. The htdig code used
to do what you're requesting, but she wanted it changed to adhere to
the standard. I only helped her debug her code so it would do what she
wanted it to, to allow (require) standard comments. She went on to give
a few examples of what standard comments could be:

> Thus, the following are legal comment declarations:
> <!--first comment
> on two lines --
> --second comment--
> --third comment--
> >
> <!>

At the time, i.e. in 3.1.0b4, htdig didn't handle these, and your code
snippet won't either. I'm assuming she had a reason to want this change.
My feeling is htdig should respect the standard, and any non-standard
behaviour should be optional.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:18 PST