[htdig3-dev] Improvements to HTML::do_tag


Geoff Hutchison (ghutchis@wso.williams.edu)
Tue, 22 Jun 1999 22:49:51 -0400 (EDT)


A few weeks ago, someone mentioned that we don't index <img alt="...">
text. I figured it would be a pretty easy addition to the HTML parser.
Along the way, I think we might be able to significantly clean up the
do_tag method in the HTML parser.

So here's how we do meta tags:

        case 20: // "meta"
        { position += length;
            Configuration conf;
            conf.NameValueSeparators("=");
            conf.Add(position);

So this seems like a really good way to parse the tags in general. After
all, what are tag attributes but key-value pairs. Thus, can't we just use
this for most of the tags where we want the attributes? Then I could get
the alt text like this:

        Configuration attrs;
        attrs.NameValueSeparators("=");
        conf.Add(position);
        ...
        // "img"
        got_word(attrs["alt"]...);

Are there any hitches I'm ignoring? Since the configuration files deal
with quoted values, shouldn't this work for even src attributes?

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Tue Jun 22 1999 - 19:05:46 PDT