Re: [htdig] premature merging

Subject: Re: [htdig] premature merging
From: Geoff Hutchison (
Date: Fri Aug 11 2000 - 12:13:23 PDT

On Fri, 11 Aug 2000 wrote:

> My exclude_urls is set to .gif

This is usually done through bad_extensions, but that's fine.

> so I can't see a problem with that. The strange thing here is that it
> goes through about 15 of the 50 start_url URLs and then merges. It
> seems to me that htdig thinks that it is finished digging for some
> reason and I can't pinpoint the reason why.

One other thing to check is that you don't have an inadvertent newline in
the start_url list--it will ignore anything after the newline. One good
way to list a series of URLs is to use the `/path/to/file` syntax to
include a file of URLs.

> I ran the dig with -vvv and the output seemed fine, it was following
> all links, indexing the pdf's, and parsing them perfectly.

But it seems to ignore the URLs after a point. This is a good reason to
either hunt for a newline w/o a '\' character before it, or to move the
URLs into a separate file and include that.

-Geoff Hutchison
Williams Students Online

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Aug 11 2000 - 02:13:29 PDT