Re: [htdig3-dev] Re: robots.txt bug (was [ANNOUNCE] ht://Dig 3.2.0b1)


Subject: Re: [htdig3-dev] Re: robots.txt bug (was [ANNOUNCE] ht://Dig 3.2.0b1)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Feb 07 2000 - 16:20:28 PST


At 11:15 AM +0200 2/7/00, Valdas Andrulis wrote:
>GH> First off, have you set case_sensitive to anything in your config file?
>
>No.

Good. This rules out any problems with the regex from this.

>GH> Then let us know what pattern it sets in the debug output--I don't
>GH> really want the whole thing but I want to see if it's setting the
>GH> pattern OK.
>
>Trying to retrieve robots.txt file
>Parsing robots.txt file using myname = htdig
>Found 'user-agent' line: htdig
>Found 'disallow' line: /cat/
>Found 'user-agent' line: htdig
>Found 'disallow' line: /foobar/
>Pattern: /foobar/

This is bad. The last line should be:
Pattern: /cat/|/foobar/

In light of a recent bug report (about a new 'allow' keyword in
robots.txt) the code probably needs to be rewritten. Nevertheless,
here's the key code:

            if (*rest)
            {
                if (pattern.length())
                    pattern << '|' << rest;
                else
                    pattern = rest;
            }

The only thing I can think of here is that "pattern = rest;" is not
performing the copying that it should...

Thoughts?
-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 07 2000 - 16:25:04 PST