htdig: geocities robots.txt


Ryan Scott (ryan@netcreations.com)
Mon, 14 Dec 1998 16:54:27 -0500


I still cannot index geocities pages. Here's what they put in their
robots.txt file:

# htdig knows where to go.
User-agent: htdig/3.1.0b1
Disallow: /admin/ # all paths except neighborhoods and members
section are disallowed
Disallow: /auditor/
Disallow: /cgi_emails/
Disallow: /cgi_html/
Disallow: /cgi-bin/
Disallow: /chat/
Disallow: /classes/
Disallow: /companies/
Disallow: /dbm_files/
Disallow: /demos/
Disallow: /error_messages/
Disallow: /errors/
Disallow: /features/
Disallow: /GeoPartners/
Disallow: /geoplus/
Disallow: /geoshops/
Disallow: /geostore/
Disallow: /geoworld/
Disallow:/GreetingCards/
Disallow: /guide/
Disallow: /homestead/
Disallow: /hoodpages/
Disallow: /htmlfrag/
Disallow: /images/
Disallow: /include/
Disallow: /index.html
Disallow: /java/
Disallow: /join/
Disallow: /LunarAwards/
Disallow: /main/
Disallow: /marketplace/
Disallow: /mediakit/
Disallow: /pictures/
Disallow: /portfolio/
Disallow: /ProgrammersPavilion/
Disallow: /pv/
Disallow: /realmedia/
Disallow: /search/
Disallow: /server-errors/
Disallow: /thread-images/
Disallow: geobook.html

I'm not too familiar with how it is all supposed to be but it appears this
doesn't cut it. I'm trying to index various neighborhoods on request of the
folks running those neighborhoods, in case you were a wonderin.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:53 PST