Steven Karel (firstname.lastname@example.org)
Sat, 2 Oct 1999 11:31:11 -0400 (EDT)
Hi. Let me preface this by saying that htdig works great in it intended
role as a search engine for our local network of webservers -- no
In the past, I've also found that htdig had another use as a "bookmark
search engine". I have too many bookmarks to remember what's where, so I
set up an alternate config file for htdig to start with my bookmark files
and to index everything within two hops. The result is a database that's
not too huge of webpages either that I've bookmarked, or that are linked
from a page I've bookmarked. It's been quite nice.
Unfortunately, now that I've upgraded to htdig 3.1.3 and tried to recreate
my index, it's not working very well. Here's my limited understanding of
When htdig is initially starting a dig, it pauses every time it encounters
a new server, to read robots.txt, etc. The program waits for each server
to respond. Unfortunately, with the strategy I'm using, it hits a LOT of
different servers, some of which are from old broken links. And, possibly
because our internet connection is sometimes flaky, it occasionally winds
up waiting and waiting for the connection to be made, and the whole
process waits and waits.
When the process pauses, using htdig -vvvv to look at the output from
htdig, it waits after a line like
New server: www.macintouch.com, 80
and using netstat, there is always a line like...
tcp 0 1 squirrel:4811 22.214.171.124:www SYN_SENT
changing server_wait_time and/or timeout in the config file doesn't seem
to do any good.
Is there any way to get around this, and have htdig either discard the
unresponsive server, or continue working on other servers while waiting?
redhat 6.0 on intel
htdig 3.1.3 compiled from source
I may try reverting to htdig 3.1.2 to see if that helps. Am willing to try
other suggested approaches.
Biology Department, Brandeis Univ, MS 008
415 South St Waltham MA 02454-9110
TEL 781 736 3104 FAX 781 736 3107
To unsubscribe from the htdig mailing list, send a message to
email@example.com containing the single word unsubscribe in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Sat Oct 02 1999 - 08:36:22 PDT