Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 17 May 1999 13:16:29 -0500 (CDT)
According to Geoff Hutchison:
> OK, I finally had some time to sit down and write some escaping code for
> the HtRegex code. I've turned on regex parsing for the limit_url_to,
> limit_normalized, exclude_urls, and bad_querystr variables.
>
> *PLEASE* test it. Try doing indexing and see if it's actually
> backwards-compatible. Try adding a regex to these options and see if it
> does what you expect. To make things backwards-compatible, regex must be
> enclosed in [] and the 'escaping' brackets will be removed from the pattern.
>
> Also, check out HtRegex::setEscaped() and tell me if I'm missing anything
> horribly dangerous for escapes. Right now it escapes '.' '?' and '+' but I
> don't think this is a comprehensive list--I just figured it would be a
> useful subset to start testing.
The special characters for basic regular expressions are: ^, $, *, .,
[, ], {, }, and of course '\'. Additionally, for extended regular
expressions, you need to escape +, ?, |, ( and ). You don't need to
worry about closing ] or }, because they have no special meaning unless
preceeded by an opening [ or {. However, the closing ) should be escaped,
according to the regex(7) manual page.
The confusing thing is that in the context of basic, \( and \) are used
for subexpressions. However, we're using extended regular expressions,
so \( matches a literal '(' character.
So, here's how I'd do it:
if (strchr("^.[$()|*+?{\\", str->Nth(pos))
transformedLimits << '\\';
transformedLimits << str->Nth(pos);
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Mon May 17 1999 - 11:27:26 PDT