Re: [htdig] lone pages & broken links


Subject: Re: [htdig] lone pages & broken links
From: Malcolm Austen (malcolm.austen@computing-services.oxford.ac.uk)
Date: Thu Feb 17 2000 - 03:03:40 PST


On Thu, 17 Feb 2000, Tom Robinson wrote:

+ 1. Look for lone pages, therefore they aren't pointed to by other URL's.

Obviously that can't be done by following links! I guess you would have to
build a complete list of possible URLs based on directory listings and
then compare that with the list of URLs indexed by ht://Dig.

+ 2. Look for pages that have broken links (links that do not point
+ anywhere).

I have a "dig report" perl script based on a script by Daniel Mckay. It
reports on pages/depths per server and also lists (per server) pages that
would have been indexed if only the robot had been allowed to see the
page. That wording is an attempt at being careful, the listing doesn't
separate pages that don't exist from pages that have restricted access. (I
have asked for a distinction to be added to the logging output but that
won't appear until sometime down the 3.2.x path.)

The report script is in http://daneel.oucs.ox.ac.uk/spider/report/
... there is a sample of the report too but not of the actual broken links
on our web pages, I thought some of our webmasters might be embarrassed
by the sort of slips that show up!

Note that this script doesn't report all broken links, only broken links
to pages that qualified for indexing.

regards,
        Malcolm.

 Malcolm.Austen@OUCS.ox.ac.uk http://users.ox.ac.uk/~malcolm/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Feb 17 2000 - 03:07:04 PST