Subject: Re: [htdig] what robots.txt actually does
From: Gilles Detillieux (email@example.com)
Date: Fri Dec 10 1999 - 09:16:01 PST
According to Kari Suomela:
> > I've tried to figure out what robots.txt actually does, and how to
> > properly configure it. I have read the FAQs etc. on several sites, but
> > cannot find plain language explanations. Apparently some searches fail,
> > and others can't find my server at all because of this.
> > Could someone, please, explain what to do with that file, so all search
> > engines and agents would work ok.
According to Daniel MacKay:
> The reference in the htdig documentation is perfect.
> Go to www.htdig.org
> CLick on Reference section, htdig.
> Go to the bottom.
> Click on the "A Standard For Robot Exclusion"
That will get you the document
which is an excellent summary, and gives pointers to more information.
I think the key sentence that Kari is looking for is this:
The presence of an empty "/robots.txt" file has no explicit associated
semantics, it will be treated as if it was not present, i.e. all robots
will consider themselves welcome.
I.e., by default, search engines should have wide open access to your
site, and it's up to you to disallow certain URLs to certain agents,
if you so desire. If you don't have a robots.txt, and a particular
engine can't spider through your site, but a web browser can manually
access all the documents you want by following HTML links, it's likely
a flaw in that engine. Note that I specifically said HTML links.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Dec 10 1999 - 09:29:14 PST