Re: [htdig] Infinite loop problem with htdig


Subject: Re: [htdig] Infinite loop problem with htdig
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Jun 09 2000 - 14:30:31 PDT


On Fri, 9 Jun 2000, Joe Baker wrote:

> 12555:121140:3:http://www.amnestyusa.org/countries/colombia/index.html/actions/r
> eports/blueprint/reports/blueprint/reports/blueprint/reports/blueprint/reports/b
> lueprint/reports/blueprint/reports/blueprint/reports/blueprint/senate12221999.ht
> ml:
>
> The actual directories are /home/aiusa/public_html/countries/colombia/actions
> and /home/aiusa/public_html/countries/colombia/actions

Sigh. This is a problem from a broken link and a webserver
misconfiguration. You probably have server-parsed pages (SSI) turned on,
right? The problem is that the HTTP spec allows the server to pass along a
virtual directory PATH_INFO to a CGI or server-parsed page. You'll notice
that if you try that URL yourself, it does actually work on the
server--most people would naturally expect a 404 or for the bit after the
index.html to be ignored.

So neither the server nor htdig is doing anything "wrong." It's just that
htdig can't dectect that it's in a loop because the URLs are different.

If you don't use server-parsing on your .html files, turn it off!

> We are just trying to index our own server. Is there a fix for this? A
> configuration option that I'm missing?

It will still need some HTTP access from time to time, but this will might
solve your first problem too. (Since local pages are obviously not
server-parsed. :-)

You want attributes like local_urls (and maybe server_aliases too):
<http://www.htdig.org/attrs.html#local_urls>
<http://www.htdig.org/attrs.html#server_aliases>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Jun 09 2000 - 12:20:53 PDT