[htdig3-dev] Regex Excludes Broken


Geoff Hutchison (ghutchis@wso.williams.edu)
Sun, 16 May 1999 14:47:27 -0400


OK, I've been beating on the new regex code. The limits code seems to work
correctly, but I can't seem to get excludes to work--it doesn't exclude
anything. At first, I thought it was a problem with '.cgi' or 'cgi-bin' and
these strings being considered incorrectly by the escaping. But even 'cgi'
or '99' don't seem to exclude URLs containing those patterns.

For example:

limit_urls_to: http://www\.htdig\.org/
  becomes -> 'http://www\.htdig\.org/'
exclude_urls: 99 =
  becomes -> '99|='
(Correct, yes?)

Then here's the code:
    //
    // If the URL contains any of the patterns in the exclude list,
    // mark it as invalid
    //
    if (excludes.match(url, 0, 0) != 0)
      {
                if (debug >= 2)
        cout << endl << " Rejected: item in exclude list ";
                return(FALSE);
      }

But that statement never becomes true. In comparison, the limit code is:
    //
    // If any of the limits are met, we allow the URL
    //
    if (limits.match(url, 1, 0) != 0) return(TRUE);

All of this looks correct to me. Anyone have sharper eyes?
-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sun May 16 1999 - 12:04:43 PDT