Re: [htdig] Problems with cgis and collecting other sites

Subject: Re: [htdig] Problems with cgis and collecting other sites
From: Geoff Hutchison (
Date: Thu Feb 10 2000 - 12:57:55 PST

On Thu, 10 Feb 2000, Walter Addison March wrote:

> but we have things like that are cgis also.
> Is there some flag I am missing to tell htdig not to pick up cgis no matter
> what they might be named or does one have to figure out all the various
> cgis that we run for the several servers on campus and add each one to
> exclude_urls?

If you don't want to index CGIs, this is correct. If you were to show me
that URL, I would not have any way of knowing it was a CGI a priori. So
the same is true for htdig when indexing. Of course you don't have to
ignore CGIs--many people include them in their databases.

> was in the URL, is there a way to restrict htdig by IPs or something so
> that it doesn't follow links like that? Or, if there is a way to exclude
> cgis not based on their urls, would that work for this?

I'd use ? as a pattern in exclude_urls since that is a common way
to include data to a CGI.

> One last point, I did try adding / to the exclude_urls
> and then ran an update... but that info is still there... is the
> information still there because I ran an update?

Correct. An update will not delete URLs from a database, period. So if you
want to get the URL out, currently you'll need to rebuild the databases
from scratch.

-Geoff Hutchison
Williams Students Online

