[htdig] Re: Mangled URLs with single quotes in page_number_text fields (PR#743)

Subject: [htdig] Re: Mangled URLs with single quotes in page_number_text fields (PR#743)
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Jan 18 2000 - 07:28:49 PST

I'm crossposting to htdig@htdig.org, because I think this is of general

According to Maurice Buxton (M.I.Buxton@exeter.ac.uk):
> in order to comply
> with XHTML etc attributes should be quoted, so as I couldn't find an escaping
> mechanism I added single quotes:
> page_number_text: "<img src='/icons/htdig/button1.gif' border=0 ... alt=1>" \
> "<img src=/icons/htdig/button2.gif border='0' ... alt=2>" \
> (etc)
> Result is that the image tag gets mangled - the quotes disappear and it gets
> truncated after the first attribute with quotes. e.g. in above example
> <img src='/icons/htdig/button1.gif' border=0 ... alt=1>
> becomes
> <img src=/icons/htdig/button1.gif
> Is there a method of quoting attributes?
> OK, I can avoid the problem by simply not quoting the attributes, but that's
> not ideal ...

Definitely a bug. The fix for this is below. This patch also fixes a
potential problem with an escaped null causing this function to overrun
the end of the string, if an attrubute ends in a backslash (I'm not sure
whether this could happen in practise or not, but I think \\ would do
that). Yes, the backslash can be used as an escape mechanism in quoted
string lists, but the catch is the string is doubly-parsed. First, the
whole attribute is parsed for variable expansion or file inclusion, and
this parsing uses backslash as an escape mechanism as well. So, to pass
a backslash to the second level of parsing (the quoted list separation),
the backslash must be escaped too. Using \\" or \\' would do the trick,
even in the unpatched code. However, with this patch you should be able
to use ' within " " or " within ' '.

--- htdig-3.1.4/htlib/QuotedStringList.cc.quotbug Thu Dec 9 18:28:47 1999
+++ htdig-3.1.4/htlib/QuotedStringList.cc Tue Jan 18 09:08:10 2000
@@ -86,13 +86,15 @@ QuotedStringList::Create(char *str, char
         if (*str == '\\')
+ if (!str[1])
+ break;
             word << *++str;
         else if (*str == quote)
             quote = 0;
- else if (*str == '"' || *str == '\'')
+ else if (!quote && (*str == '"' || *str == '\''))
             quote = *str;

Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

