Re: htdig: Slashes in query string causing loops


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 5 Jan 1999 15:24:47 -0600 (CST)


According to Glenn Nielsen:
>
> Try replacing the "/" in the GET portion of the URL to "%2f",
> the new URL would look like this:
>
> /index.asp?date=11%2f21%2f98
>
> Regards,
>
> Glenn Nielsen
>
>
> Adam Coyne wrote:
> >
> > I just upgraded to 3.1.0b3 (and applied the memory leak patches), and I'm
> > having a problem where htdig gets into an infinite loop. My home page is
> > /index.asp. It has a bunch of links to itself with different dates in the
> > query string, such as:
> >
> > /index.asp?date=11/21/98
> >
> > After htdig finishes indexing the pages for each date, it begins indexing
> > urls looking like this:
> >
> > /index.asp?date=11/21/index.asp?date=12/1/98
> >
> > It appears to be interpreting the final slash to mean a subdirectory, even
> > though it's after the question-mark. Doesn't seem quite right.
> >
> > --
> > Adam Coyne -- adam@criticalmass.com

The problem is that when digging a page like:

        http://www.host.dom/index.asp?date=11/21/98

if htdig encounters an anchor like <a href="index.asp?date=12/1/98">
where the href isn't fully qualified, i.e. doesn't have a leading slash,
then htdig must make up the new URL by concatenating the new href to the
"base" part of the current URL. There's a bug in URL.cc in that it
doesn't differentiate between slashes before or after the "?" in the
URL above, so when it tries to figure out what the base part of the
current URL is, it just strips off everything after the last slash.
This results in URLs like Adam reported above.

It should first strip off the "?" and everything following it, as
this isn't part of the base. Changing the index.asp CGI script to
generate fully qualified hrefs, or to use %2f in the date query, would
work around this bug, but a fix for the bug would be preferable.

The barely tested patch below should fix this for you. Please try it
out, on 3.1.0b4 preferably, and let us know if it solves the problem.

--- htlib/URL.cc.pathbug Thu Dec 24 11:20:20 1998
+++ htlib/URL.cc Tue Jan 5 13:12:55 1999
@@ -210,6 +210,11 @@
             // The reference is relative to the parent
             //
             _path = parent._path;
+ int i = _path.indexOf('?');
+ if (i >= 0)
+ {
+ _path.chop(_path.length() - i);
+ }
             if (_path.last() == '/')
             {
                 //

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Thu Jan 07 1999 - 07:52:38 PST