Subject: Re: [htdig] Going for the big dig
From: Geoff Hutchison (firstname.lastname@example.org)
Date: Tue Dec 19 2000 - 08:59:16 PST
On Tue, 19 Dec 2000, Gilles Detillieux wrote:
> Geoff was saying you'd need to alter the code in order to ignore robots.txt,
> which definitely would be a bad thing if you then use the hacked htdig to
> index sites that are not your own.
Yes, and while it may or may not be easy to do, it will never be an option
to ignore the robots.txt file. (And it will never be an option to ignore
robots META tags either.)
> Actually, on my site I don't bother with exclude_urls at all, and use the
> robots.txt file instead. This way, anything that I don't want indexed by
> htdig won't be indexed by any other search engine either.
True, though other search engines usually also ignore certain patterns
(e.g. cgi-bin). I also heavily use the META robots tag, though these are
not as old a standard and sometimes are still ignored.
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Tue Dec 19 2000 - 09:10:02 PST