Gilles Detillieux (email@example.com)
Tue, 2 Mar 1999 16:50:48 -0600 (CST)
According to Matt Edwards:
> HtDig 3.1.1 isn't parsing (slightly non-standard) comments correctly.
> Extra dashes in the comment can confuse the current parser into
> ignoring a lot of content. For example <!--comment----> is seen as
> an uncompleted comment beginning.
> It seems a lot of web content doesn't strictly adhere to the
> "standard" for comments, so we should be a little careful here.
> For example both IE and Netscape require "<!--" comments to end
> with a "-->" without whitespace between the "--" and the ">".
> Perhaps htDig would be better off doing the same.
> According to Marjolein Katsma:
> > Starting on my next project, I had to dig in HTML.cc, and found th
> > efollowing code to filter out comments:
> According to Gilles Detilleux
Actually, this was more text quoted from Marjolein...
> > While this will catch *most* comments, it will see some perfectly legal
> > comments as illegal and skip the rest of the page. The best definition
> > of comments is found in HTML 2.0 (unchanged in the actual DTD in later
> > versions, but never properly explained any more...):
> > "To include comments in an HTML document, use a comment declaration. A
> > comment declaration consists of `<!' followed by zero or more comments
> > followed by `>'. Each comment starts with `--' and includes all text up
> > to and including the next occurrence of `--'. In a comment declaration,
> > white space is allowed after each comment, but not before the first
> > comment. The entire comment declaration is ignored."
Marjolein brought up this issue in January. The htdig code used
to do what you're requesting, but she wanted it changed to adhere to
the standard. I only helped her debug her code so it would do what she
wanted it to, to allow (require) standard comments. She went on to give
a few examples of what standard comments could be:
> Thus, the following are legal comment declarations:
> <!--first comment
> on two lines --
> --second comment--
> --third comment--
At the time, i.e. in 3.1.0b4, htdig didn't handle these, and your code
snippet won't either. I'm assuming she had a reason to want this change.
My feeling is htdig should respect the standard, and any non-standard
behaviour should be optional.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:18 PST