Subject: Re: [htdig] Problems with cgis and collecting other sites
From: Geoff Hutchison (email@example.com)
Date: Thu Feb 10 2000 - 12:57:55 PST
On Thu, 10 Feb 2000, Walter Addison March wrote:
> but we have things like http://haverford.edu/acc/WebX that are cgis also.
> Is there some flag I am missing to tell htdig not to pick up cgis no matter
> what they might be named or does one have to figure out all the various
> cgis that we run for the several servers on campus and add each one to
If you don't want to index CGIs, this is correct. If you were to show me
that URL, I would not have any way of knowing it was a CGI a priori. So
the same is true for htdig when indexing. Of course you don't have to
ignore CGIs--many people include them in their databases.
> was in the URL, is there a way to restrict htdig by IPs or something so
> that it doesn't follow links like that? Or, if there is a way to exclude
> cgis not based on their urls, would that work for this?
I'd use ? as a pattern in exclude_urls since that is a common way
to include data to a CGI.
> One last point, I did try adding /ugweb.cs.ualberta.ca/ to the exclude_urls
> and then ran an update... but that info is still there... is the
> information still there because I ran an update?
Correct. An update will not delete URLs from a database, period. So if you
want to get the URL out, currently you'll need to rebuild the databases
Williams Students Online
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Feb 10 2000 - 13:00:28 PST