Geoff Hutchison (firstname.lastname@example.org)
Sun, 28 Mar 1999 23:16:44 -0500 (EST)
On Sun, 28 Mar 1999, p0222 wrote:
> How can I tell htdig to *ignore* the robots.txt-files, on the whole web or
> on specified servers ?
> That's my problem:
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^EXLCUDE LIST ?!?
> How can i turn this exlcude list *OFF* ?!?
No, not quite. First off, you cannot turn off the robots.txt parsing. It's
a standard and if you have a problem with a server's robots.txt file, you
should really take it up with the webmaster.
That's not your problem. The default config file ships with the option:
exclude_urls: cgi-bin .cgi
So this option is excluding the option you mention. If you don't want
this, remove it. (One caveat... Currently, if you make exclude_urls empty,
it will ignore *all* URLs. So instead, set it to something that cannot
occur, like !-no-url-! and it won't exclude anything on the servers it
Williams Students Online
To unsubscribe from the htdig mailing list, send a message to
email@example.com containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Sun Mar 28 1999 - 21:27:46 PST