Re: htdig: htdig-8.1b2: Ignoring URLs?


Frank Richter (Frank.Richter@hrz.tu-chemnitz.de)
Mon, 30 Nov 1998 10:36:29 +0100 (MET)


> > Digging with max_hop_count 8: htdig-8.0.8b2 - ca. 55,000 documents
> > htdig-8.1.0b2 - ca. 13,000 documents
> > max_hop_count 12: htdig-8.1.0b2 - 44,757 documents
>
> It is a known bug that 3.1.0b2 ignores CGIs. More precisely, it trims off
> the part of the URL after a ? in the CGI.

That's not the reason in my case, we don't have so many CGI URLs.

Example:
http://www.tu-chemnitz.de/index.html contains a link to
http://www.tu-chemnitz.de/misc/links.html which contains a link to
http://www.tu-chemnitz.de/docs/perl.html

htdig-8.0.8b2:
0:0:0:http://www.tu-chemnitz.de/: ++
...
345:40:1:http://www.tu-chemnitz.de/misc/links.html:
********-----+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-+-+---*+*+-----------------------------------------------------+*****
size = 24555
....
7538:1823:2:http://www.tu-chemnitz.de/docs/perl.html:
+--+--+-------------- size = 1943

htdig-8.1.0b2: (3 weeks later, so small changes in size etc.)
0:0:0:http://www.tu-chemnitz.de/: +++*
...
347:40:3:http://www.tu-chemnitz.de/misc/links.html:
********---+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-++-+---*+*+-----------------------------------------------------+*****
size = 24440
...
5479:2040:12:http://www.tu-chemnitz.de/docs/perl.html: size = 2579
          ^^??
See here level 12 (?!) - so no links in perl.html are digged.

- Frank

-- 
Email: Frank.Richter@hrz.tu-chemnitz.de  http://www.tu-chemnitz.de/~fri/
Work:  Computing Services, Technical University, 09107 Chemnitz, Germany

---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:55 PST