Subject: Re: [htdig3-dev] Fetching outside of domain list (not supposed to)
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Thu Jan 04 2001 - 09:03:39 PST
According to Toxik - Dann Cohen:
> I'm a new comer (6 month user of ht://dig) to this list and before
> saying anything I would like to say hi to everyone. Now to the good
> stuff =)
> I've encounter a problem with the fetching part. I have about 1800 site
> in my "start_url" to fetch with a "max_hop_count" of 1 and it seems to
> go beyond the 1800.
> HTTP statistics
> Persistent connections : Yes
> HEAD call before GET : No
> Connections opened : 14973
> Connections closed : 14973
> Changes of server : 6030
> HTTP Requests : 35357
> HTTP KBytes requested : 209216
> HTTP Average request time : 0.647679 secs
> HTTP Average speed : 9.13605 KBytes/secs
> Has you can see the value of "changes server" is higher than 1800. I can
> also see in the log that it goes beyond the domain (see bellow for an
> example), the domain is www.singapore-inc.com and you can see that a
> "mailto:" and "www.sedb.com.sg" is pushed in. The problem doesn't happen
> when I fetch them alone, any suggestion or hints are welcome.
If you haven't already figured it out, you should be setting max_hop_count
to 0, not 1. One hop means it will attempt to follow all the valid links
in those initial 1800 documents.
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Jan 04 2001 - 09:15:47 PST