Subject: [htdig] Problems with cgis and collecting other sites
From: Walter Addison March (wmarch@haverford.edu)
Date: Thu Feb 10 2000 - 10:55:36 PST

Hi. I just installed the binary of ht://dig 3.1.3 and I've noticed a
couple issues (which might be one issue) that I need some help with. I did
search the ht://dig web site for help.

In our conf file I have:

exclude_urls: /cgi-bin/ .cgi .pl

but we have things like http://haverford.edu/acc/WebX that are cgis also.
Is there some flag I am missing to tell htdig not to pick up cgis no matter
what they might be named or does one have to figure out all the various
cgis that we run for the several servers on campus and add each one to

In addition, and on a similiar note, despite the fact that I have:
limit_urls_to: http://www.haverford.edu/

htdig still dug

Understanding that this link was followed because http://www.haverford.edu/
was in the URL, is there a way to restrict htdig by IPs or something so
that it doesn't follow links like that? Or, if there is a way to exclude
cgis not based on their urls, would that work for this?

One last point, I did try adding /ugweb.cs.ualberta.ca/ to the exclude_urls
and then ran an update... but that info is still there... is the
information still there because I ran an update?

Sorry for the length but any responses would be muchly appreciated.

Walter Addison March
Web Administrator/Programmer
Academic Computing, Haverford College

