Re: htdig: geocities robots.txt


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Sun, 22 Nov 1998 13:15:48 -0500


At 2:05 PM -0500 11/20/98, Ryan Scott wrote:
>1. What's the exact user-agent or entry they should be putting into
>their robots.txt file to let us in?

The default user-agent is "htdig"

>2. There is a way to make the user-agent htdig uses to be something
>else? I could make it look like a netscape browser. I hate to do this,
>however, it's just not 'right', although it seems it would work.

In your config file:
user_agent: foo-htdig

I don't know if I'd make it look like Netscape--you may run into problems
with features the web site assumes is in Netsacpe. Perhaps Lynx might be a
better model.

>Geocities is using this robots.txt to prevent spam crawlers from
>trawling their site for email addresses, but I didn't know that spammers
>were known for being voluntarily ethical.

No, but they often don't modify code either. So depending on what software
they're using, it may recognize robots.txt anyway.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:50 PST