Re: [htdig] relative URL retrieval infinite recursive loop


Subject: Re: [htdig] relative URL retrieval infinite recursive loop
From: Glenn Nielsen (glenn@voyager.apg.more.net)
Date: Tue Jan 04 2000 - 19:12:37 PST


I found a solution. We were seeing the same problem with
linklint, a link checker written in perl. We use Apache,
so I played around with mod_rewrite and finally got it to
recognize the problem and return file gone.

If anyone is interested I can post the mod_rewrite config
and some example HTML.

Geoff Hutchison wrote:
>
> At 6:31 AM -0600 12/28/99, Glenn Nielsen wrote:
> >PROBLEM
> >-------
> >
> >The following is a valid URL for a document...
> >
> ><a href="/parent/parent.html/index.html">Parent Page</a>
> >
> >where "/parent/parent.html" is a file on the server that is
> >returned by the webserver from the above URL.
>
> Is it valid? Yes.
> Is it a good URL. No.
>
  Thats true, but I'm not the one publishing the content, in fact,
  I try very hard not to create content (HTML) ;-) But when you
  administer 10 web servers with 100's of different customers
  publishing content you have no control over the correctness of
  the URL and BCC errors BCC error => when the problem is Between
  the Chair and Computer.

> Now this has come up recently on the bug report list. But when I
> tried this at "home" so to speak, the server returned a 404. (IMHO,
> if parent.html is NOT server-parsed, this is the Right Thing To Do
> TM.)
>
> >A possible solution would be to compare the contents of the parent and
> >child documents when the child comes from a relative URL. If the
> >document contents for the parent and child are identical and have the
> >same last modification date stamp, ignore the child document and report
> >an error. Then continue, digging the next href in the parent.
>
> Maybe. This is a bit of a pain though since you have to "remember"
> that it came from a relative URL. The whole problem is resolved when
> you have duplicate-document detection, which has been on the plate
> for a while. Unless someone volunteers to do it, it may be some time
> before it sees light of day, though.
>

  Duplicate doc detection would help me since we index some external
  sites on which we have no admin priveledges and http has to be used.
  Sounds like a good idea.

Thanks,

Glenn

----------------------------------------------------------------------
Glenn Nielsen glenn@more.net | /* Spelin donut madder |
MOREnet System Programming | * if iz ina coment. |
Missouri Research and Education Network | */ |
----------------------------------------------------------------------

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Jan 04 2000 - 19:25:41 PST