Re: [htdig] Duplicate URLs?


Jim Cole (greyleaf@yggdrasill.net)
Mon, 28 Jun 1999 22:29:08 -0600


Geoff Hutchison wrote:
>
> Jim Cole wrote:
>
> > Is this normal behavior? Might it have something to do with why my
> > databases are becoming so large?
>
> This is not normal behavior. I would guess you have a symlink in those
> directories. This symlink has essentially created an infinite loop. For
> example, fyi -> fyi is one such loop right there.
Hmm.. there is a symlink in the home directory that points to a
directory in another partition where the web space is allocated. There
is one symlink in the htdocs directory that points to an empty public
ftp directory. That is it. The fyi directory is definitely not a
symlink, nor are any of the others that it seems to be looping through.
Also, it does not enter an infinite loop. It does eventually complete
the dig and generate functional, if massive, databases.

Is there possibly some behind the scenes server configuration issue that
could result in this type of behavior? I don't have much control over
this configuration since the site is hosted through a virtual domain
hosting service, but I would really like to figure out what the problem
is.

> This is almost definitely the root cause of large databases--you're
> indexing your files *many*, *many* times over.
It is. My interim solution was to configure htdig to attack the site on
a directory by directory basis with max hops set to 1. The result was a
set of databases that were more than an order of magnitude smaller and
seemed to accurately reflect the site. It is a rather tedious approach
and not very conducive to updates, but it was enough to get me up and
running. Better ideas?

Thanks.

Jim Cole
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Jun 28 1999 - 20:39:29 PDT