Subject: Re: [htdig] premature merging
Date: Fri Aug 11 2000 - 12:15:23 PDT
According to Geoff Hutchison:
> On Fri, 11 Aug 2000 email@example.com wrote:
> > syntax in the config files so I know that it isn't that. I'm not sure
> > if it makes a difference but these start URL's all contain /cgi-bin/ and the
> I'd make sure you've set the exclude_urls appropriately. Remember that the
> default is to exclude cgi-bin.
My exclude_urls is set to .gif
>Also check limit_urls_to. By default, it takes on the value of start_url,
>which won't do if you list very specific URLs in this parameter, because
>your limit_urls_to won't be open-ended enough to allow other URLs.
As an example, all of the URL's in my start_url look similar to
except that the remaining part after the ? changes
and that page links you to several URL's that look like
where the info after the ? changes.
My limit_urls_to attribute looks like
so I can't see a problem with that. The strange thing here is that it
goes through about 15 of the 50 start_url URLs and then merges. It
seems to me that htdig thinks that it is finished digging for some
reason and I can't pinpoint the reason why.
>So one way to get more information on this
>is to run htdig by itself and add the -vvvv flag for more debugging
I ran the dig with -vvv and the output seemed fine, it was following
all links, indexing the pdf's, and parsing them perfectly.
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Aug 11 2000 - 02:02:39 PDT