Re: [htdig] Broken link checker


Benjamin Smedberg (41smedberg@cua.edu)
Wed, 24 Feb 1999 11:59:55 -0500


Well...I don't know anything about PERL, but there are people in my office
who do...however, having taken a look at Retriever.cc, it seems that dumping
bad links to a file in an easily parsable format, with the following
information, would be an acceptable solution. Then, htnotify could take this
file, and send out the appropriate e-mail:

referrer URL
link URL
referrer e-mail notification address
Error code (not found, server not reachable, etc.)

It would be nice for the main docDB to include at least one referrer, so
that bad links could be notified to at least one person upon re-digs, but
this is not necessary for my website, because I do a complete re-dig at
least once every 2-3 weeks. (Update digs 2-3 times/week). If the digger
found bad links on a re-dig (no referrer available), it could be configured
either to e-mail the university webmaster, or simply not worry about the bad
links until the next complete dig.

This could theoretically be expanded in the future to include broken image
checking, though this would cause digs to take a lot longer.

Benjamin Smedberg
------------------------------
How to make God laugh: tell Him YOUR plans.
-----Original Message-----
From: Marjolein Katsma <webmaster@javawoman.com>
To: htdig@htdig.org <htdig@htdig.org>
Cc: htdig@htdig.org <htdig@htdig.org>
Date: Wednesday, February 24, 1999 9:33 AM
Subject: Re: [htdig] Broken link checker

>
>At 08:34 1999-02-24 -0500, you wrote:
>>
>>>When C is removed A and B contain a broken link. Not the
>>>owner of C needs a message but the owner of A and B. ht://Dig
>>>already has this owner feature (htnotify-mail). But this
>>>would definitly be a new feature not just an output
>>>parser.
>>
>>Not really. As I mentioned, I have a pretty simple script to parse the
"not
>>found" messages. If C no longer exists, then links from A and B are both
>>broken and reported. The referer portions of the "not found" messages
>>appear for me, so I'm not sure what you're worrying about.
>
>Does this output also list the owners of the referring pages? *They* need
>to be notified.
>
>Owners of "htdig" (maintainer), and of pages A, B and C can actually be
>four different persons. It doesn't really help then to send mail to the
>owner of the now-missing page C or to the htdig maintainer when really the
>owners of the pages linking to C (A and B) should be notified.
>
>>
>>It would seem a more powerful script would do something like this:
>>
>>Read in file of regex patterns
>>Repeat for each pattern
>> if pattern matches a "not found" _referer_, then
>> send a message to the owner of the file
>> (either from htnotify or the regex file)
>>end repeat
>>
>>Thoughts?
>>
>>
>>-Geoff Hutchison
>>Williams Students Online
>>http://wso.williams.edu/
>>
>>
>>------------------------------------
>>To unsubscribe from the htdig mailing list, send a message to
>>htdig@htdig.org containing the single word "unsubscribe" in
>>the SUBJECT of the message.
>>
>
>Marjolein Katsma webmaster@javawoman.com
>Java Woman - http://javawoman.com/
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig@htdig.org containing the single word "unsubscribe" in
>the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST