Re: [htdig] what robots.txt actually does

Subject: Re: [htdig] what robots.txt actually does
From: Gilles Detillieux (
Date: Fri Dec 10 1999 - 09:16:01 PST

According to Kari Suomela:
> > I've tried to figure out what robots.txt actually does, and how to
> > properly configure it. I have read the FAQs etc. on several sites, but
> > cannot find plain language explanations. Apparently some searches fail,
> > and others can't find my server at all because of this.
> >
> > Could someone, please, explain what to do with that file, so all search
> > engines and agents would work ok.

According to Daniel MacKay:
> The reference in the htdig documentation is perfect.
> Go to
> CLick on Reference section, htdig.
> Go to the bottom.
> Click on the "A Standard For Robot Exclusion"

That will get you the document

which is an excellent summary, and gives pointers to more information.
I think the key sentence that Kari is looking for is this:

  The presence of an empty "/robots.txt" file has no explicit associated
  semantics, it will be treated as if it was not present, i.e. all robots
  will consider themselves welcome.

I.e., by default, search engines should have wide open access to your
site, and it's up to you to disallow certain URLs to certain agents,
if you so desire. If you don't have a robots.txt, and a particular
engine can't spider through your site, but a web browser can manually
access all the documents you want by following HTML links, it's likely
a flaw in that engine. Note that I specifically said HTML links.
Engines don't follow links in JavaScript or other non-HTML form.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Dec 10 1999 - 09:29:14 PST