htdig: [PATCH] nofollow not always obeyed


Michael Spann (mikes@mail.sv.dialogic.com)
Sat, 14 Nov 1998 00:53:59 -0800 (PST)


A <meta name="robots" content="none"> or any of the other variety of ways of
telling htdig not to follow links through a page has two small bugs. Either
by it self would not manifest this problem I saw. The following patch seems
to have fixed the problem.

*** HTML.orig Mon Nov 2 16:21:51 1998
--- HTML.cc Sat Nov 14 00:40:55 1998
*************** HTML::parse(Retriever &retriever, URL &b
*** 256,262 ****
                  if (description.length() > max_description_length)
                  {
                      description << " ...";
! retriever.got_href(*href, description);
                      in_ref = 0;
                      description = 0;
                  }
--- 256,263 ----
                  if (description.length() > max_description_length)
                  {
                      description << " ...";
! if (dofollow)
! retriever.got_href(*href, description);
                      in_ref = 0;
                      description = 0;
                  }
*************** HTML::do_tag(Retriever &retriever, Strin
*** 512,520 ****
          }
  
          case 3: // "/a"
! if (dofollow && in_ref)
              {
! retriever.got_href(*href, description);
                  in_ref = 0;
              }
              break;
--- 513,522 ----
          }
  
          case 3: // "/a"
! if (in_ref)
              {
! if (dofollow)
! retriever.got_href(*href, description);
                  in_ref = 0;
              }
              break;

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:48 PST