Re: [htdig] premature merging


Subject: Re: [htdig] premature merging
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Aug 11 2000 - 12:13:23 PDT


On Fri, 11 Aug 2000 campbel@pc177.cisti.nrc.ca wrote:

> My exclude_urls is set to .gif

This is usually done through bad_extensions, but that's fine.

> so I can't see a problem with that. The strange thing here is that it
> goes through about 15 of the 50 start_url URLs and then merges. It
> seems to me that htdig thinks that it is finished digging for some
> reason and I can't pinpoint the reason why.

One other thing to check is that you don't have an inadvertent newline in
the start_url list--it will ignore anything after the newline. One good
way to list a series of URLs is to use the `/path/to/file` syntax to
include a file of URLs.

> I ran the dig with -vvv and the output seemed fine, it was following
> all links, indexing the pdf's, and parsing them perfectly.

But it seems to ignore the URLs after a point. This is a good reason to
either hunt for a newline w/o a '\' character before it, or to move the
URLs into a separate file and include that.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Aug 11 2000 - 02:13:29 PDT