Subject: Re: [htdig] htdig -- infinite looping (3.1.5) and "redirection"
From: Geoff Hutchison (email@example.com)
Date: Fri Mar 31 2000 - 18:32:39 PST
At 8:29 PM -0500 3/31/00, Sphboc@aol.com wrote:
>I'm running into some instances where htdig never appears to terminate.
>Results which have been found, up to the point of termination, appear to be
>valid as far as they go; I haven't yet tried analyzing the url list for a
>pattern of repitition.
>Is there any particular type of problem, within a website, which will tend to
>cause such a condition?
Yes. Since pages are determined to be unique based on the URL alone,
it's possible to catch almost any spider in an infinite loop if you
have bad URLs, depending on your server configuration. This can
especially happen if you have SSI or CGI-generated content because
these can use the PATH_INFO environment variable to make their own
For example, the site I designed this summer served everything out of
a mod_perl CGI. It was called StoreFront:
Now let's say that we have a section blah.
<http://www.foo.com/StoreFront/blah/> Now if I can tack on another
portion and get to the same page, then we have an infinite loop:
>More-or-less related, I've been getting some empty search results due to
>"redirection". Htdig claims it only finds one page; manual browser search,
>however, appears to locate quite a few pages which have the desired url (ie,
>they should pass the "limit urls to" condition).
In this case, you will want to examine the url list very carefully.
If you get stuck, you can also see the reasons for ignoring URLs by
using the -vvv switch. (The more v's, the more verbose.)
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Mar 31 2000 - 19:30:11 PST