Frank Richter (Frank.Richter@hrz.tu-chemnitz.de)
Mon, 30 Nov 1998 10:36:29 +0100 (MET)
> > Digging with max_hop_count 8: htdig-8.0.8b2 - ca. 55,000 documents
> > htdig-8.1.0b2 - ca. 13,000 documents
> > max_hop_count 12: htdig-8.1.0b2 - 44,757 documents
>
> It is a known bug that 3.1.0b2 ignores CGIs. More precisely, it trims off
> the part of the URL after a ? in the CGI.
That's not the reason in my case, we don't have so many CGI URLs.
Example:
http://www.tu-chemnitz.de/index.html contains a link to
http://www.tu-chemnitz.de/misc/links.html which contains a link to
http://www.tu-chemnitz.de/docs/perl.html
htdig-8.0.8b2:
0:0:0:http://www.tu-chemnitz.de/: ++
...
345:40:1:http://www.tu-chemnitz.de/misc/links.html:
********-----+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-+-+---*+*+-----------------------------------------------------+*****
size = 24555
....
7538:1823:2:http://www.tu-chemnitz.de/docs/perl.html:
+--+--+-------------- size = 1943
htdig-8.1.0b2: (3 weeks later, so small changes in size etc.)
0:0:0:http://www.tu-chemnitz.de/: +++*
...
347:40:3:http://www.tu-chemnitz.de/misc/links.html:
********---+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-++-+---*+*+-----------------------------------------------------+*****
size = 24440
...
5479:2040:12:http://www.tu-chemnitz.de/docs/perl.html: size = 2579
^^??
See here level 12 (?!) - so no links in perl.html are digged.
- Frank
-- Email: Frank.Richter@hrz.tu-chemnitz.de http://www.tu-chemnitz.de/~fri/ Work: Computing Services, Technical University, 09107 Chemnitz, Germany---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:55 PST