Subject: Re: [htdig3-dev] Fetching outside of domain list (not supposed to)
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Jan 04 2001 - 10:21:26 PST
According to Toxik - Dann Cohen:
> Hi Gilles,
>
> If I set the max_hop_count to 0, it will only fetch the first page,
> and want it to fetch 1 page further so max_hop_count need to be at 1
> but what's happening is that the fetch goes behond the 1800 domains,
> when it's supposed to reject the domain that are not in the start_url...
>
> Any suggestion, by the way it works fine when there less domain say
> 1500 domains ??? very strange...
Hmmm. I imagine that the very long list in start_url, which gets
transferred to limit_urls_to by default, is overflowing the StringMatch
state table for the limits matching. I don't know that there's an easy
fix for this. The 3.2 code will be using regular expression handling
rather than StringMatch for the limit_urls_to attribute, but I don't know
for a fact that it too won't have problems with a huge list like this.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Jan 04 2001 - 10:33:15 PST