Re: [htdig] Parsing 3.1.3 log files


Geoff Hutchison (ghutchis@wso.williams.edu)
Fri, 24 Sep 1999 11:25:58 -0500


>Getting the dead pages is easy; in the log they're marked "not found".
>Getting their sources is a little harder, but with V3.1.2 all you had to

Actually, it's a *lot* easier than this. Use the -s flag. At the end
of the dig, it will print the broken URLs and their referers. There's
even a contributed script in the archive that will help you do
various things with the list.

>Also: is there any documentation for the format of the log file? what are
>the three numbers at the beginning of the line, e.g.
>
> 14:2:0:<url>: not found

Index #, DocID, Hopcount

where Index # is incremented every step during that indexing run,
DocID is the internal database ID #, and hopcount is the number of
hops from the start_url.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Sep 24 1999 - 09:31:03 PDT