Re: [htdig] relative URL retrieval infinite recursive loop

Subject: Re: [htdig] relative URL retrieval infinite recursive loop
From: Glenn Nielsen (
Date: Tue Jan 04 2000 - 19:12:37 PST

I found a solution. We were seeing the same problem with
linklint, a link checker written in perl. We use Apache,
so I played around with mod_rewrite and finally got it to
recognize the problem and return file gone.

If anyone is interested I can post the mod_rewrite config
and some example HTML.

Geoff Hutchison wrote:
> At 6:31 AM -0600 12/28/99, Glenn Nielsen wrote:
> >-------
> >
> >The following is a valid URL for a document...
> >
> ><a href="/parent/parent.html/index.html">Parent Page</a>
> >
> >where "/parent/parent.html" is a file on the server that is
> >returned by the webserver from the above URL.
> Is it valid? Yes.
> Is it a good URL. No.
  Thats true, but I'm not the one publishing the content, in fact,
  I try very hard not to create content (HTML) ;-) But when you
  administer 10 web servers with 100's of different customers
  publishing content you have no control over the correctness of
  the URL and BCC errors BCC error => when the problem is Between
  the Chair and Computer.

> Now this has come up recently on the bug report list. But when I
> tried this at "home" so to speak, the server returned a 404. (IMHO,
> if parent.html is NOT server-parsed, this is the Right Thing To Do
> TM.)
> >A possible solution would be to compare the contents of the parent and
> >child documents when the child comes from a relative URL. If the
> >document contents for the parent and child are identical and have the
> >same last modification date stamp, ignore the child document and report
> >an error. Then continue, digging the next href in the parent.
> Maybe. This is a bit of a pain though since you have to "remember"
> that it came from a relative URL. The whole problem is resolved when
> you have duplicate-document detection, which has been on the plate
> for a while. Unless someone volunteers to do it, it may be some time
> before it sees light of day, though.

  Duplicate doc detection would help me since we index some external
  sites on which we have no admin priveledges and http has to be used.
  Sounds like a good idea.



Glenn Nielsen | /* Spelin donut madder |
MOREnet System Programming | * if iz ina coment. |
Missouri Research and Education Network | */ |

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Tue Jan 04 2000 - 19:25:41 PST