Re: [htdig] Re: htdig: Comments


Marjolein Katsma (webmaster@javawoman.com)
Wed, 03 Mar 1999 11:39:36 +0100


At 16:53 1999-03-02 -0800, Matt Edwards wrote:
>
>
>On Tue, 2 Mar 1999, Matt Edwards wrote:
>> HtDig 3.1.1 isn't parsing (slightly non-standard) comments correctly.
>>
>> Extra dashes in the comment can confuse the current parser into
>> ignoring a lot of content. For example <!--comment----> is seen as
>> an uncompleted comment beginning.

Exactly. This is *not* a comment at all. There are no "extra dashes" either.

>>
>> It seems a lot of web content doesn't strictly adhere to the
>> "standard" for comments, so we should be a little careful here.
>>
>> For example both IE and Netscape require "<!--" comments to end
>> with a "-->" without whitespace between the "--" and the ">".
>> Perhaps htDig would be better off doing the same.

Which versions did you test with? See below...

>
>In response, Gilles Detillieux wrote:
>> Marjolein brought up this issue in January. The htdig code used
>> to do what you're requesting, but she wanted it changed to adhere to
>> the standard. I only helped her debug her code so it would do what she
>> wanted it to, to allow (require) standard comments. She went on to give
>> a few examples of what standard comments could be:
>>
>>> Thus, the following are legal comment declarations:
>>>
>>> <!--first comment
>>> on two lines --
>>>
>>> --second comment--
>>> --third comment--
>>> >
>>>
>
>Except that both IE and netscape treat the above as an unclosed
>comment beginning, so nobody can get away with doing this in the real
>world.

Exactly which "IE and netscape" are you talking about? I tested this with
the following browsers, *all* of which treat this form of comment *correctly*:
- MSIE 2.0
- MSIE 4.01/SP1
- NS 3.03
- NS 4.04
- NS 2.01 (16 bits)
- Opera 3.0
- Mosaic 3.0
- Web TV Viewer 1.1
- Lynx 2.7
All running on WinNT 4.0/SP4

>
>However, my real issue was not with this behavior, but with the parser
>getting confused by extra dashes in comments. Extra dashes may be
>"non-standard", but because both netscape and IE allow them,
>I've found enough content with extra dashes to make me worry.
>
>How about a compromise where whitespace is allowed between the final
>"--" and the closing ">".
>
[snip]
>
>> At the time, i.e. in 3.1.0b4, htdig didn't handle these, and your code
>> snippet won't either. I'm assuming she had a reason to want this change.
>> My feeling is htdig should respect the standard, and any non-standard
>> behaviour should be optional.
>
>Good point. However there is a real-word industry standard here and a
>theoretical paper standard. Which behaviour would most people prefer out
>of the box?

W3C standard. There is only one. With the next one, browsers are even
supposed to NOT DISPLAY pages that don't conform to the DTD. People would
soon enough find their own mistakes that way, and we'd all profit. For now,
people should use a validator to find their mistakes, not trust the
friendlyness of (some) browsers.

I'm not sure what kind of "extra dashes" you are talking about. My
algorithm is already allowing quite a lot of that, iow it's *not* a strict
implementation of the W3C standard SGML comments but does allow some
illegal ones as well.

>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig@htdig.org containing the single word "unsubscribe" in
>the SUBJECT of the message.
>

Marjolein Katsma webmaster@javawoman.com
Java Woman - http://javawoman.com/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:18 PST