Re: htdig: htdig-8.1b2: Ignoring URLs?


Geoff Hutchison (ghutchis@wso.williams.edu)
Mon, 30 Nov 1998 17:21:58 -0500


At 4:36 AM -0500 11/30/98, Frank Richter wrote:
>htdig-8.1.0b2: (3 weeks later, so small changes in size etc.)
>0:0:0:http://www.tu-chemnitz.de/: +++*
>...
>347:40:3:http://www.tu-chemnitz.de/misc/links.html:
>...
>5479:2040:12:http://www.tu-chemnitz.de/docs/perl.html: size = 2579
> ^^??
>See here level 12 (?!) - so no links in perl.html are digged.

Hmm. I think we're seeing a bug that was obscured in 3.0.8b2. In the
following code, we're seeing if we could have taken a shorter route through
a reference, and then saying we took the longer one! That doesn't seem
fair, I think we should always take the shortest route possible.

Try this and let me know if the hopcounts come out correctly.
*** htdig3/htdig/Retriever.cc Fri Nov 27 13:33:37 1998
--- htdig3.dev/htdig/Retriever.cc Mon Nov 30 17:17:47 1998
*************** Retriever::got_href(URL &url, char *desc
*** 915,921 ****
            current_anchor_number = old_anchor;

            if (ref->DocHopCount() < currenthopcount + 1)
! ref->DocHopCount(currenthopcount + 1);

            docs.Add(*ref);

--- 915,923 ----
            current_anchor_number = old_anchor;

            if (ref->DocHopCount() < currenthopcount + 1)
! // If we had taken the path through this ref
! // We'd be here faster than currenthopcount
! currenthopcount = ref->DocHopCount(); // So update it!

            docs.Add(*ref);

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:55 PST