Re: [htdig] premature merging

Subject: Re: [htdig] premature merging
Date: Fri Aug 11 2000 - 12:15:23 PDT

According to Geoff Hutchison:
> On Fri, 11 Aug 2000 wrote:
> > syntax in the config files so I know that it isn't that. I'm not sure
> > if it makes a difference but these start URL's all contain /cgi-bin/ and the
> I'd make sure you've set the exclude_urls appropriately. Remember that the
> default is to exclude cgi-bin.

My exclude_urls is set to .gif

>Also check limit_urls_to. By default, it takes on the value of start_url,
>which won't do if you list very specific URLs in this parameter, because
>your limit_urls_to won't be open-ended enough to allow other URLs.

As an example, all of the URL's in my start_url look similar to

except that the remaining part after the ? changes

and that page links you to several URL's that look like

where the info after the ? changes.

My limit_urls_to attribute looks like \

so I can't see a problem with that. The strange thing here is that it
goes through about 15 of the 50 start_url URLs and then merges. It
seems to me that htdig thinks that it is finished digging for some
reason and I can't pinpoint the reason why.

>So one way to get more information on this
>is to run htdig by itself and add the -vvvv flag for more debugging

I ran the dig with -vvv and the output seemed fine, it was following
all links, indexing the pdf's, and parsing them perfectly.

I'm stumped,

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Aug 11 2000 - 02:02:39 PDT