htdig: htdig hangs with many limit_urls_to


Nyced (nyce@webstar.com.gh)
Sun, 10 May 1998 05:36:33 GMT


I am trying to index the contents of my web site by asking htdig
to start at each of the urls in my site *and* limit the search
to that same set of urls.

In other words I only want what is within that section of the
internet.

So for http://www.internetghana.com/digisign and http://www.ghana.com
htdig should start at both sites and limit traversal to those same
sites.

htDig appears to hang when I do this for about ... 200 urls.

I also tried creating a configuration file for each urls but htdig
simply rotated between htdig.sdsu.edu (not mentioned anywhere in
the url list) and my local web server (one of the start URLS)

What gives ?

The configuration file causing the problems is at
        http://www.webstar.com.gh/htdig.conf.txt

BTW, strace shows the last system call as an open on the configuration
file. No more system calls after that and extremely high CPU utilization.

Before I go in and try and debug, I would like to know if this has been solved
by anyone else. It shouldn't take this long to create a list of patterns. A
cursory glance at the code showed that htdig would most likely be building the regexps ...

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:16 PST