[htdig] PATCH: case insensitive META robots tag parsing


Subject: [htdig] PATCH: case insensitive META robots tag parsing
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri Mar 17 2000 - 09:13:54 PST


[ Reposted from htdig3-bugs@htdig.org... ]

From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
Subject: Re: htdig ignores noindex META-Tag (PR#810)
To: pruem@machno.hbi-stuttgart.de
Date: Fri, 17 Mar 2000 11:08:31 -0600 (CST)
Cc: ht3bugs@htdig.org, htdig3-bugs@htdig.org

According to David Pruem (pruem@machno.hbi-stuttgart.de):
> ht://Dig ignored the following directives in a bunch of pages and indexed
> them.
>
> <HTML>
> <HEAD>
> <TITLE>7</TITLE>
> <META NAME="robots" CONTENT="NOINDEX,FOLLOW">
> </HEAD>
>
> Have You any idea what could cause this behaviour?

Oops! The standard clearly says that the name and contents of such tags
should be case insensitive, but when htdig looked at the content parameter,
it looked for words in lower case only! Clearly a bug. I've fixed it in
3.2, but here is the fix for 3.1.5...

--- htdig/HTML.cc.robotsbug Tue Feb 15 14:08:41 2000
+++ htdig/HTML.cc Fri Mar 17 10:59:38 2000
@@ -911,7 +911,7 @@ HTML::do_tag(Retriever &retriever, Strin
                          && strlen(conf["content"]) !=0)
                   {
                     String content_cache = conf["content"];
-
+ content_cache.lowercase();
                     if (content_cache.indexOf("noindex") != -1)
                       {
                         doindex = 0;

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Mar 17 2000 - 08:11:10 PST