Subject: Re: [htdig] premature merging
From: Geoff Hutchison (firstname.lastname@example.org)
Date: Fri Aug 11 2000 - 12:13:23 PDT
On Fri, 11 Aug 2000 email@example.com wrote:
> My exclude_urls is set to .gif
This is usually done through bad_extensions, but that's fine.
> so I can't see a problem with that. The strange thing here is that it
> goes through about 15 of the 50 start_url URLs and then merges. It
> seems to me that htdig thinks that it is finished digging for some
> reason and I can't pinpoint the reason why.
One other thing to check is that you don't have an inadvertent newline in
the start_url list--it will ignore anything after the newline. One good
way to list a series of URLs is to use the `/path/to/file` syntax to
include a file of URLs.
> I ran the dig with -vvv and the output seemed fine, it was following
> all links, indexing the pdf's, and parsing them perfectly.
But it seems to ignore the URLs after a point. This is a good reason to
either hunt for a newline w/o a '\' character before it, or to move the
URLs into a separate file and include that.
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Aug 11 2000 - 02:13:29 PDT