Re: [htdig] htdig keeling over...


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 3 Feb 1999 11:06:56 -0600 (CST)


According to Geoff Hutchison:
> > Header line: HTTP/1.1 200 OK
> > Header line: Server: Microsoft-IIS/4.0
> > Header line: Date: Wed, 03 Feb 1999 01:12:44 GMT
> > Header line: Content-Type: text/html
> > Header line: Cache-Control: no-cache="set-cookie,set-cookie2"
> > Header line: Last-Modified: 27 Jan 1999 01:12:44 GMT
>
> This last line is invalid. Last-modified headers should have a format like:
> Header line: Last-Modified: Mon, 2 Feb 1999 01:12:44 GMT
>
> See http://www.pmg.lcs.mit.edu/cgi-bin/rfc/view?2068
>
> So you say "I don't care if it's invalid, ht://Dig should be able to keep
> going." Fair enough. But I'm beginning to worry about the complexity of that
> section of code if people keep finding non-compliant servers. There's a
> reason for RFCs...
>
> What should we do, decide that we'll give the current time to documents from
> servers that return poorly-formatted dates? That doesn't sound like a good
> solution to me.

Well, we already ignore bad weekdays, so why not allow missing weekdays
too. Here's a patch to htdig-3.1.0dev-013199 to make getdate a bit
more fault-tolerant.

I'd like people to try it out to make sure it works, especially on
systems that have had problems with mystrptime/strftime in the past.
Note that this patch won't work for 3.1.0b4, because of other changes to
getdate() since that release. I'll post a patch for 3.1.0b4 separately.
Please grab the one that is applicable to your source, or grab the latest
snapshot and add this patch, and please let me know if this fixes the
problems you've had, or breaks anything. I've walked through the code
quite carefully, and tested it on my server, and I'm quite confident
it works, but independent confirmation would be a plus, especially as
we're very close to final release.

--- htdig/Document.cc.datebug Tue Jan 26 18:27:21 1999
+++ htdig/Document.cc Wed Feb 3 10:39:20 1999
@@ -191,9 +191,9 @@
 time_t
 Document::getdate(char *datestring)
 {
- String d = datestring;
     struct tm tm;
     time_t ret;
+ char *s;
 
     //
     // Two possible time designations:
@@ -203,23 +203,29 @@
     //
     // We strip off the weekday before sending to strptime
     // because some servers send invalid weekdays!
+ // (Some don't even send a weekday, but we'll be flexible...)
  
- int weekday_index = d.indexOf(',');
- if (weekday_index > 3)
- mystrptime(d.sub(weekday_index + 2), "%d-%b-%y %T", &tm);
+ s = strchr(datestring, ',');
+ if (s)
+ s++;
     else
- mystrptime(d.sub(weekday_index + 2), "%d %b %Y %T", &tm);
-
- if (&tm != NULL) // We hope it isn't NULL!
+ s = datestring;
+ while (isspace(*s))
+ s++;
+ if (strchr(s, '-') && mystrptime(s, "%d-%b-%y %T", &tm) ||
+ mystrptime(s, "%d %b %Y %T", &tm))
       {
+ // correct for mystrptime, if %Y format saw only a 2 digit year
         if (tm.tm_year < 0)
           tm.tm_year += 1900;
         
         if (debug > 2)
           {
- cout << "Translated " << d << " to ";
+ cout << "Translated " << datestring << " to ";
             char buffer[100];
- strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", &tm);
+ // Leave out %a for weekday, because we don't set it anymore...
+ //strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", &tm);
+ strftime(buffer, sizeof(buffer), "%d %b %Y %T", &tm);
             cout << buffer << " (" << tm.tm_year << ")" << endl;
           }
 #if HAVE_TIMEGM
@@ -230,6 +236,11 @@
       }
     else
       {
+ if (debug > 2)
+ {
+ cout << "Cannot translate " << datestring <<
+ ", using current time" << endl;
+ }
         ret = time(0); // This isn't the best, but it works. *fix*
       }
     if (debug > 2)

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Feb 10 1999 - 17:09:05 PST