Re: [htdig] Parsing 3.1.3 log files

Geoff Hutchison (
Fri, 24 Sep 1999 11:25:58 -0500

>Getting the dead pages is easy; in the log they're marked "not found".
>Getting their sources is a little harder, but with V3.1.2 all you had to

Actually, it's a *lot* easier than this. Use the -s flag. At the end
of the dig, it will print the broken URLs and their referers. There's
even a contributed script in the archive that will help you do
various things with the list.

>Also: is there any documentation for the format of the log file? what are
>the three numbers at the beginning of the line, e.g.
> 14:2:0:<url>: not found

Index #, DocID, Hopcount

where Index # is incremented every step during that indexing run,
DocID is the internal database ID #, and hopcount is the number of
hops from the start_url.

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word unsubscribe in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Sep 24 1999 - 09:31:03 PDT