Re: [htdig3-dev] Re: robots.txt bug (was [ANNOUNCE] ht://Dig 3.2.0b1)

Subject: Re: [htdig3-dev] Re: robots.txt bug (was [ANNOUNCE] ht://Dig 3.2.0b1)
From: Geoff Hutchison (
Date: Mon Feb 07 2000 - 16:20:28 PST

At 11:15 AM +0200 2/7/00, Valdas Andrulis wrote:
>GH> First off, have you set case_sensitive to anything in your config file?

Good. This rules out any problems with the regex from this.

>GH> Then let us know what pattern it sets in the debug output--I don't
>GH> really want the whole thing but I want to see if it's setting the
>GH> pattern OK.
>Trying to retrieve robots.txt file
>Parsing robots.txt file using myname = htdig
>Found 'user-agent' line: htdig
>Found 'disallow' line: /cat/
>Found 'user-agent' line: htdig
>Found 'disallow' line: /foobar/
>Pattern: /foobar/

This is bad. The last line should be:
Pattern: /cat/|/foobar/

In light of a recent bug report (about a new 'allow' keyword in
robots.txt) the code probably needs to be rewritten. Nevertheless,
here's the key code:

            if (*rest)
                if (pattern.length())
                    pattern << '|' << rest;
                    pattern = rest;

The only thing I can think of here is that "pattern = rest;" is not
performing the copying that it should...


To unsubscribe from the htdig3-dev mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Mon Feb 07 2000 - 16:25:04 PST