Re: [htdig3-dev] Fetching outside of domain list (not supposed to)


Subject: Re: [htdig3-dev] Fetching outside of domain list (not supposed to)
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Jan 04 2001 - 10:21:26 PST


According to Toxik - Dann Cohen:
> Hi Gilles,
>
> If I set the max_hop_count to 0, it will only fetch the first page,
> and want it to fetch 1 page further so max_hop_count need to be at 1
> but what's happening is that the fetch goes behond the 1800 domains,
> when it's supposed to reject the domain that are not in the start_url...
>
> Any suggestion, by the way it works fine when there less domain say
> 1500 domains ??? very strange...

Hmmm. I imagine that the very long list in start_url, which gets
transferred to limit_urls_to by default, is overflowing the StringMatch
state table for the limits matching. I don't know that there's an easy
fix for this. The 3.2 code will be using regular expression handling
rather than StringMatch for the limit_urls_to attribute, but I don't know
for a fact that it too won't have problems with a huge list like this.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Jan 04 2001 - 10:33:15 PST