Subject: [htdig] PATCH: case insensitive META robots tag parsing
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri Mar 17 2000 - 09:13:54 PST
[ Reposted from htdig3-bugs@htdig.org... ]
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
Subject: Re: htdig ignores noindex META-Tag (PR#810)
To: pruem@machno.hbi-stuttgart.de
Date: Fri, 17 Mar 2000 11:08:31 -0600 (CST)
Cc: ht3bugs@htdig.org, htdig3-bugs@htdig.org
According to David Pruem (pruem@machno.hbi-stuttgart.de):
> ht://Dig ignored the following directives in a bunch of pages and indexed
> them.
>
> <HTML>
> <HEAD>
> <TITLE>7</TITLE>
> <META NAME="robots" CONTENT="NOINDEX,FOLLOW">
> </HEAD>
>
> Have You any idea what could cause this behaviour?
Oops! The standard clearly says that the name and contents of such tags
should be case insensitive, but when htdig looked at the content parameter,
it looked for words in lower case only! Clearly a bug. I've fixed it in
3.2, but here is the fix for 3.1.5...
--- htdig/HTML.cc.robotsbug Tue Feb 15 14:08:41 2000
+++ htdig/HTML.cc Fri Mar 17 10:59:38 2000
@@ -911,7 +911,7 @@ HTML::do_tag(Retriever &retriever, Strin
&& strlen(conf["content"]) !=0)
{
String content_cache = conf["content"];
-
+ content_cache.lowercase();
if (content_cache.indexOf("noindex") != -1)
{
doindex = 0;
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Mar 17 2000 - 08:11:10 PST