Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 4 Jun 1999 14:23:46 -0500 (CDT)
According to Geoff Hutchison:
> On Fri, 4 Jun 1999, Rzepa, Henry wrote:
>
> > If anyone can give us some hints as to how to modify htdig to follow
> > <embed> as well as <a> tags, we would be most grateful!!
>
> This patch has not even been tested to see if it compiles. But it should
> do what you ask.
..
> *************** HTML::do_tag(Retriever &retriever, Strin
> *** 1097,1102 ****
> --- 1097,1202 ----
> break;
> }
>
> + case 24: // embed
..
case 24 is identical to case 25, as far as I can tell, so the two can be
merged together. Why duplicate code?
> + case 25: // object
> + {
> + which = -1;
> + int pos = attrs.FindFirstWord(position, which, length);
This will match any of "src", "href" or "name". Is this all right?
If the <embed> and <object> tags both use only src=..., you could use
srcMatch.FindFirstWord(...) instead.
> + if (pos < 0 || which != 0)
> + break;
> + position += pos + length;
> + while (*position && *position != '=')
> + position++;
> + if (!*position)
> + break;
> + position++;
> + while (isspace(*position))
> + position++;
> + //
> + // Allow either single quotes or double quotes
> + // around the URL itself
> + //
> + if (*position == '"'||*position == '\'')
> + {
> + position++;
> + q = strchr(position, position[-1]);
> + if (!q)
> + break;
> + //
> + // We seem to have matched the opening quote char
> + // Mark the end of the quotes as our endpoint, so
> + // that we can continue parsing after the current
> + // text
> + //
> + *q = '\0';
> + //
> + // If a '#' is present in a quoted URL,
> + // treat that as the end of the URL, but we skip
> + // past the quote to parse the rest of the anchor.
> + //
> + if ((t = strchr(position, '#')) != NULL)
> + *t = '\0';
> + }
> + else
> + {
> + q = position;
> + while (*q && *q != '>' && !isspace(*q))
> + q++;
> + *q = '\0';
> + }
> + retriever.got_href(position);
This last function call won't work. You'd need to do something like:
if (dofollow)
{
URL *href = new URL(position, *base);
retriever.got_href(*href, "");
delete href;
}
> + break;
> + }
> +
> default:
> return; // Nothing...
> }
>
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Jun 04 1999 - 11:37:31 PDT