Marjolein Katsma (firstname.lastname@example.org)
Tue, 16 Mar 1999 23:20:42 +0100
At 22:49 1999-03-16 +0100, you wrote:
>Gilles Detillieux wrote:
>> > It finds the first occurence of --> so don't recurse comments. Anyway,
>> > it works on my htdig system.
>> This isn't quite right. We had a big discussion about this two weeks ago.
>> The HTML standard allows white space (even newlines) between the closing
>> "--" and ">" of a comment. The trick is to gobble up any extra dashes
>> after the first two, and then skip white space. If that doesn't leave
>> you at a ">", I think you have to start over again, scanning for the next
>You're right about that, but HTML.cc did miss the end > anyway.
>I don't see why there may be a white space between -- and >. Now we're
>getting at points like the user uses ---.
No: There may be white space before the final > of a comment declaration
because the *standard* says so. That's in no way similar to users not
knowing how to formulate a syntactically correct comment...
Since the standard says there can be white space after every *comment*
inside a *comment declaration*, and most browsers (at least varied bunch I
tested) handle this fine, htdig will have to handle it, too.
Look back in the archives - I've quoted the exact text defining an SGML
> You'll have to scan for -- but
>do not skip these two because you'll have on - left and htdig will miss
>the >. Okay, the user didn't create good HTML, but I don't want to miss
>links for indexing because of some "programming" error.
>Somewhere in that piece of code there is a position update with:
>position = q+2 (to get after the --). May be changing it to
>position = q+1 will do the trick.
>> > Another problem is that M$ Frontpage 98 in combination with Frontpage
>> > Server Extension don't do
>> > <AREA> tags. They create a webbot (inside a comment). If the webbot has
>> > links, these links don't
>> > get indexed. Of couse this is a M$ / user problem, it just that you know
>> > of it.
>> can enhance the HTML parser to deal with these webbot links reliably,
>> without breaking anything else, go for it. Otherwise, it'll remain a
>> problem, until M$ learns to adhere to standards other than their own. ;-)
>You'll need to parse the comments to do that.
>Greetz from nighty Holland,
>To unsubscribe from the htdig3-dev mailing list, send a message to
>email@example.com containing the single word "unsubscribe" in
>the SUBJECT of the message.
Marjolein Katsma firstname.lastname@example.org
Java Woman - http://javawoman.com/
To unsubscribe from the htdig3-dev mailing list, send a message to
email@example.com containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Mar 16 1999 - 14:42:14 PST