htdig: geocities robots.txt


Ryan Scott (ryan@netcreations.com)
Fri, 20 Nov 1998 14:05:25 -0500


Geocities has a robots.txt file running that prevents htdig from
crawling the site. I have written to them, they claimed to have fixed
it, but they didn't succeed.

1. What's the exact user-agent or entry they should be putting into
their robots.txt file to let us in?

2. There is a way to make the user-agent htdig uses to be something
else? I could make it look like a netscape browser. I hate to do this,
however, it's just not 'right', although it seems it would work.

Geocities is using this robots.txt to prevent spam crawlers from
trawling their site for email addresses, but I didn't know that spammers
were known for being voluntarily ethical.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:50 PST