ht://Dig URL quotes handling


Tim Frost (Tim.Frost@nz.eds.com)
Tue, 09 Dec 1997 14:15:55 +1300


Andrew,

In using version 3.0.8b2 of the ht://Dig package, I have found that
htdig itself does not correctly handle some URLs.

The following cases are not correctly handled:
- any URL using apostrophes rather than quote marks
- quoted URLs with # or ? as part of the URL.
        Examples are :
                "/cgi-bin/xxxx?parm=val"
                "/complex/doc.html#section"

The attached patch, generated against the sources for 3.08 B2, appears
to fix those problems.

Tim

-- 
Tim Frost, Systems Engineer         Email: Tim.Frost@nz.eds.com
EDS (NZ) Ltd,                       Voice: +64 4 495-0504
P.O. Box 3647,                      Fax:   +64 4 495-0473
Wellington, New Zealand.

diff -ru htdig-3.0.8b2-orig/htdig/HTML.cc htdig-3.0.8b2/htdig/HTML.cc --- htdig-3.0.8b2-orig/htdig/HTML.cc Sun Dec 7 22:14:40 1997 +++ htdig-3.0.8b2/htdig/HTML.cc Mon Dec 8 20:33:37 1997 @@ -309,7 +309,7 @@ HTML::do_tag(Retriever &retriever, String &tag) { char *position = tag.get() + 1; // Skip the '<' - char *q; + char *q, *t; int which, length; while (isspace(*position)) @@ -358,12 +358,31 @@ position++; while (isspace(*position)) position++; - if (*position == '"') + // + // Allow either single quotes or double quotes + // around the URL itself + // + if (*position == '\'' || *position == '"') { position++; - q = strchr(position, '"'); + q = strchr(position, position[-1]); // Match start if (!q) break; + // + // We seem to have matched the opening quote char + // + *q = '\0'; + // + // If a '?' or '#' is present in a quoted URL, + // treat that as the end of the URL, but we skip + // past the quote to parse the rest of the anchor. + // + // Is there a better way of looking for these? + // + if ((t = strchr(position, '#')) != NULL) + *t = '\0'; + if ((t = strchr(position, '?')) != NULL) + *t = '\0'; } else { @@ -374,8 +393,8 @@ *q != '?' && *q != '#') q++; + *q = '\0'; } - *q = '\0'; delete href; href = new URL(position, *base); in_ref = 1;



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:24 PST