Re: [htdig3-dev] Bug on htdig3


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Fri, 16 Jul 1999 13:10:09 -0500 (CDT)


According to loic@ceic.com:
> > > 8:12:1:http://www.senga.org/uri/html>: not found
> ..
> >
> > The CVS version is much more lenient about URLs. If you read the
> > messages, it's trying to connect to the URLs
> > "http://www.senga.org/uri/html>" or "http://www.senga.org/support.html>"
> > which are incorrect links.
>
> I thing this is because the quotes are missing :
>
> <a href=uri/html>uri</a>
>
> Do you think htdig should permanently consider this an incorect href ?
> If so it will have troubles with a lot of existing web sites.

I had a feeling this might crop up after the changes to HTML.cc. Here's
the fix, which I just committed to the CVS source tree:

Fri Jul 16 13:04:27 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>

        * htdig/HTML.cc(parse): fix to prevent closing ">" from being passed
        to do_tag().

Index: htdig/HTML.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/HTML.cc,v
retrieving revision 1.48
diff -u -p -r1.48 HTML.cc
--- htdig/HTML.cc 1999/07/13 20:58:06 1.48
+++ htdig/HTML.cc 1999/07/16 17:19:49
@@ -276,9 +276,9 @@ HTML::parse(Retriever &retriever, URL &b
             q = (unsigned char*)strchr((char *)position, '>');
             if (!q)
               break; // Syntax error in the doc. Tag never ends.
- tag = 0;
- tag.append((char*)position + 1, q - position);
             position++;
+ tag = 0;
+ tag.append((char*)position, q - position);
             while (isspace(*position))
                 position++;
             if (!in_space && spacebeforetags.CompareWord((char *)position)
@@ -328,8 +328,9 @@ HTML::parse(Retriever &retriever, URL &b
                         q = (unsigned char*)strchr((char *)position, '>');
                         if (q)
                         {
+ position++;
                             tag = 0;
- tag.append((char*)position + 1, q - position);
+ tag.append((char*)position, q - position);
                             do_tag(retriever, tag);
                             position = q+1;
                         }

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Jul 16 1999 - 10:27:21 PDT