[htdig3-dev] robots.txt actual examples


Subject: [htdig3-dev] robots.txt actual examples
From: loic@ceic.com
Date: Wed Feb 09 2000 - 00:17:58 PST


       Hi,

       I found 15 robots.txt out of 600 using the Allow attribute. These
are
robots.txt.146
robots.txt.169
robots.txt.209
robots.txt.276
robots.txt.293
robots.txt.321
robots.txt.384
robots.txt.404
robots.txt.412
robots.txt.498
robots.txt.52
robots.txt.53
robots.txt.61
robots.txt.628
robots.txt.82
        in http://www.senga.org/htdig/robots/.

        And 2 using User-Agent with something different from *. The
first one is interesting since it suggests how it's
implemented by harvest (unless the author of the robots.txt is mistaken :-).

# robots.txt for www.carleton.ca

User-Agent: CULibraryHTDig
Allow: /~ssdata
Disallow: /

User-Agent: harvest
User-Agent: Harvest/1.5.19
Disallow: /rrdr
Disallow: /tlrc
Disallow: /lyris
Disallow: /cgi-bin
Disallow: /experts
Disallow: /gallery
Disallow: /bookstore
Disallow: /stats.html
Disallow: /duc/events
Disallow: /cu/directories
Disallow: /CCS/docs/matlab
Disallow: /ccs/docs/matlab

# robots.txt for ?
User-Agent: *
Disallow: /requisition
Disallow: /CGI
Disallow: /cgi-bin
Disallow: /STAT

User-Agent: Linkbot
Disallow: /requisition
Disallow: /CGI
Disallow: /cgi-bin
Disallow: /STAT

User-Agent: Roverbot
Disallow: /

-- 
		Loic Dachary

24 av Secretan 75019 Paris Tel: 33 1 42 45 09 16 e-mail: loic@dachary.org URL: http://www.senga.org/

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Feb 09 2000 - 04:59:27 PST