Re: [htdig] Suse long run: done, problem solved

Subject: Re: [htdig] Suse long run: done, problem solved
From: David Robley (
Date: Wed Apr 26 2000 - 18:32:21 PDT

On 27 Apr, Peter L. Peres wrote:
> Hi,
> the machine finished ! The loop was in the java api docs. There were no
> other loops. There is no bug in htdig wrt. this problem (looping).
> Here are some stats from the end:
> 27425.60user 10781.29system 43:11:27elapsed 24%CPU (0avgtext+0avgdata
> 0maxrent)k
> 0inputs+0outputs (18429778major+3453038minor)pagefaults 2501532swaps
> htdig ran with a niceness 18 for the last 25% of the indexing. Load was
> 0.8 or so during this time. docdb is about 200MB. My input was about
> 220MB.
> The loop problem was in the tree:
> /usr/doc/packages/javadoc/docs/api/
> which has more than 500 entries.
> System: i486/100MHz/24MB RAM 4.3+2.8 GB EIDE disks (not UDMA), headless
> (ethernet only) Suse 6.2 Linux (w. modified html documentation system - by
> me). As you can see the machine was swapping like crazy. I think I'd need
> a machine with 256MB RAM to avoid serious swapping. Not likely anytime
> soon.
> thank you all for the ideas,
> Peter

I think I've come across this sort of problem when trying to index a
series of documents that have a lot of internal references (A
HREF="#target"> and htdig tries to follow each of these links, ending up
going in ever decreasing circles until....

My solution was to add something like html# to the exclude_urls list.


David Robley                        | WEBMASTER & Mail List Admin
AusEinet                            |
            Flinders University, ADELAIDE, SOUTH AUSTRALIA

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Wed Apr 26 2000 - 16:19:19 PDT