Re: [htdig] External parsers: VRML added: Following <embed> tags?


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 4 Jun 1999 15:57:25 -0500 (CDT)


According to Geoff Hutchison:
>
> On Fri, 4 Jun 1999, Gilles Detillieux wrote:
>
> > case 24 is identical to case 25, as far as I can tell, so the two can be
> > merged together. Why duplicate code?
>
> Are you talking about something like this?
> case 24: // embed
> case 25: // object
> ...
>
> If there's a legal syntax to combine the two cases, great. I don't have my
> reference book around. Other parts of the HTML.cc parser need some
> cleaning up too.

Exactly. Certainly, that's how you do it in C, so I can't imagine they'd
dissallow it in C++. That's the whole reason for ending each case with
a "break;". Otherwise, it falls through to the code for the next case.

You could even do something like:

        case 8: // "img"
            imgflag++;

        case 24: // "embed"
        case 25: // "object"
        {
            ...

            if (imgflag)
                retriever.got_image(position);
            else if (dofollow)
            {
                URL *href = new URL(position, *base);
                retriever.got_href(*href, "");
                delete href;
            }
            imgflag = 0;
            break;
        }

to merge together even more code.

> > This will match any of "src", "href" or "name". Is this all right?
> > If the <embed> and <object> tags both use only src=..., you could use
> > srcMatch.FindFirstWord(...) instead.
>
> This is probably better. Of course this means the IMG tag is wrong, since
> this is where I grabbed the code.

Yup. The attrs object only makes sense for <a> tags, because the code
tests for the 3 cases. It'll work in the other cases, but it would
allow invalid cases as valid.

> > This last function call won't work. You'd need to do something like:
> >
> > if (dofollow)
> > {
> > URL *href = new URL(position, *base);
> > retriever.got_href(*href, "");
> > delete href;
> > }
>
> As I said, it wasn't tested in the least. But someone asked how to add
> embed and object tag parsing, so I showed them. At the moment, I don't
> have much time to spare to coding for another few days. However, it seemed
> like any necessary modifications would be easy enough for someone else to
> do. Yeah, I know, I should have tried it...

I know the feeling. When I first saw the request, I thought it would be
a fun challenge if I had a moment, but you beat me to it. I'm finding my
time stretched a lot lately too!

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Jun 04 1999 - 13:16:52 PDT