Re: [htdig] Re: htdig: Comments

Marjolein Katsma (
Thu, 04 Mar 1999 19:07:24 +0100

At 09:04 1999-03-04 -0600, Gilles Detillieux wrote:
>According to Geoff Hutchison:
>> >Except that both IE and netscape treat the above as an unclosed
>> >comment beginning, so nobody can get away with doing this in the real
>> >world.
>> >
>> >However, my real issue was not with this behavior, but with the parser
>> >getting confused by extra dashes in comments. Extra dashes may be
>> >"non-standard", but because both netscape and IE allow them,
>> >I've found enough content with extra dashes to make me worry.
>> >
>> >How about a compromise where whitespace is allowed between the final
>> >"--" and the closing ">".
>> OK, I've tried to stay out of this. I would obviously prefer to deal with
>> the standard. However, there's enough non-standard content out there that
>> needs to be acceptible.
>> *However* I don't see your "compromise" about whitespace as anything of the
>> sort. How is it a compromise? Who was talking about whitespace anyway? I
>> thought your question was about:
>> <!-- Comment ---->
>> In that case, it seems like a reasonable request to allow it.
>How does this sound as a compromise. We still allow multiple comments,
>delimited by "--" on either side, within a single "<! ... >" enclosure,
>but, whenever we find a "--" we skip all extra hyphens right after the
>first two. So,
><!-- valid comment -->
><!--valid comment--- >
><!---- valid comment 1 --
>-- valid comment 2 ------
>.. but ...
><!-- valid first comment -- invalid second comment -->
>How does that suit everyone? It seems this would allow W3C standard
>comments, as well as comments following IE's home-brew rules. The
>example of the invalid second comment above is, I believe, against
>the rules that Marjolein explained back in January. (Correct me if
>I'm wrong.) I don't know how Netscape and IE would deal with them.

Yes, I've been thinking about this today (the network was down at the
office, so I had plenty of time to think :-)).

The main reason for my changed algorithm was that the original one 1) did
not recognize all legal comments (legal according to the standard - the
real one, not any 'de facto' one) and 2) would also not index *anything*
in a document when encountering a comment it didn't recognize.

My algorithm was meant to include *all* legal comments (according to the
standard) and also make sure that as much as possible of the rest of the
document is still indexed if something does not look like a comment after
all. I tried to build in some flexibility, but maybe not enough. Certainly
the idea was never to exclude all comments not strictly adhering to the

So yes, I like your proposal (as long as it doesn't exclude the white space
at the end of a comment). (Indeed, your last example is not a legal comment
but it wouldn't hurt to treat it as such.)

All this does leave a problem (for me, at least): just where do you stop?
What (other) coding errors are you going to allow? I still believe the
actual (W3C) standard should be the starting point for recognizing start
and end tags and any "extensions" to that should be very sparing
introduced: they could easily lead to undesirable effects.

>Gilles R. Detillieux E-mail: <>
>Spinal Cord Research Centre WWW:
>Dept. Physiology, U. of Manitoba Phone: (204)789-3766
>Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>To unsubscribe from the htdig mailing list, send a message to
> containing the single word "unsubscribe" in
>the SUBJECT of the message.

Marjolein Katsma
Java Woman -
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Mon Mar 15 1999 - 08:57:45 PST