htdig: htdig hangs with many limit_urls_to

Nyced (
Sun, 10 May 1998 05:36:33 GMT

I am trying to index the contents of my web site by asking htdig
to start at each of the urls in my site *and* limit the search
to that same set of urls.

In other words I only want what is within that section of the

So for and
htdig should start at both sites and limit traversal to those same

htDig appears to hang when I do this for about ... 200 urls.

I also tried creating a configuration file for each urls but htdig
simply rotated between (not mentioned anywhere in
the url list) and my local web server (one of the start URLS)

What gives ?

The configuration file causing the problems is at

BTW, strace shows the last system call as an open on the configuration
file. No more system calls after that and extremely high CPU utilization.

Before I go in and try and debug, I would like to know if this has been solved
by anyone else. It shouldn't take this long to create a list of patterns. A
cursory glance at the code showed that htdig would most likely be building the regexps ...

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:16 PST