Michael Spann (mikes@mail.sv.dialogic.com)
Sat, 14 Nov 1998 00:53:59 -0800 (PST)
A <meta name="robots" content="none"> or any of the other variety of ways of
telling htdig not to follow links through a page has two small bugs. Either
by it self would not manifest this problem I saw. The following patch seems
to have fixed the problem.
*** HTML.orig Mon Nov 2 16:21:51 1998
--- HTML.cc Sat Nov 14 00:40:55 1998
*************** HTML::parse(Retriever &retriever, URL &b
*** 256,262 ****
if (description.length() > max_description_length)
{
description << " ...";
! retriever.got_href(*href, description);
in_ref = 0;
description = 0;
}
--- 256,263 ----
if (description.length() > max_description_length)
{
description << " ...";
! if (dofollow)
! retriever.got_href(*href, description);
in_ref = 0;
description = 0;
}
*************** HTML::do_tag(Retriever &retriever, Strin
*** 512,520 ****
}
case 3: // "/a"
! if (dofollow && in_ref)
{
! retriever.got_href(*href, description);
in_ref = 0;
}
break;
--- 513,522 ----
}
case 3: // "/a"
! if (in_ref)
{
! if (dofollow)
! retriever.got_href(*href, description);
in_ref = 0;
}
break;
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:48 PST