htdig: Robots META tag


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Mon, 02 Mar 1998 15:40:27 -0500


I was looking for general info on META tags and ran into the Web Robots pages:
http://info.webcrawler.com/mak/projects/robots/robots.html

It seems that ht://Dig is falling into somewhat nonstandard behavior as far
as meta tags. Perhaps some changes are needed to go with the newer tagging
standards for robot exclusion. I'd suggest some small code changes for the
next (beta) version. For example, I'd like to see the meta
name="description" as well as name="robots" added.

This is *VERY* much off the top of my head, but would one way of
implementing the <meta name="robots" content="(no)index"> tag be something
like this?

*** htdig/HTML.cc.orig Fri Aug 15 01:59:26 1997
--- htdig/HTML.cc Mon Mar 2 15:25:50 1998
***************
*** 588,593 ****
--- 588,604 ----
                {
                    doindex = 0;
                }
+ else if (mystrcasecmp(cache, "robots") == 0)
+ {
+ if (mystrcasecmp(conf["content"], "index") == 0)
+ {
+ doindex = 1;
+ }
+ else if (mystrcasecmp(conf["content"], "noindex") == 0)
+ {
+ doindex = 0;
+ }
+ }
            }
            else if (conf["name"] &&
                     mystrcasecmp(conf["name"], "htdig-noindex") == 0)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:49 PST