Re: [htdig] Indexing a list of sites -- catching failures


Subject: Re: [htdig] Indexing a list of sites -- catching failures
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Aug 25 2000 - 13:47:36 PDT


On Fri, 25 Aug 100 twallace@neo.lucidgreen.com wrote:

> Some sites seem to cause htdig to fail. When this happens, htdig doesn't

I'm not sure what you mean by "fail." In some cases, htdig may not index a
site, i.e. the site is unreachable, the robots.txt forbids it, the
webserver returns no data, etc. But in no case should a site actually
crash htdig. If there are additional URLs left in the queue, it will
continue. If the new server represented the last URL to follow, it will
stop.

> What I would like to do is to somehow index each site separately and have
> some kind of error log if htdig hits a site that it fails on (for whatever
> reason). Then, I would like for it to procede to the next site in the list,

You will probably find it convenient to use htdig -v and redirect the
output to a log--this will show you the URLs that were (or were not)
indexed. It seems this wouldn't require much additional troubleshooting.

If you don't want to do that, you can also use the -s flag to get stats on
the number of documents indexed per server. Obviously if one shows no
documents indexed, that's your culprit.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Aug 25 2000 - 13:48:52 PDT