denis filipetti (email@example.com)
Wed, 03 Feb 1999 15:34:30 -0500
Many thanks Gilles, this patch works like a champ. The bad date format is
ignored and digging can and does happily continue. I will inform the Jrun
folks (which otherwise have a nice product).
At 11:06 AM 2/3/99 -0600, Gilles Detillieux wrote:
>According to Geoff Hutchison:
>> > Header line: HTTP/1.1 200 OK
>> > Header line: Server: Microsoft-IIS/4.0
>> > Header line: Date: Wed, 03 Feb 1999 01:12:44 GMT
>> > Header line: Content-Type: text/html
>> > Header line: Cache-Control: no-cache="set-cookie,set-cookie2"
>> > Header line: Last-Modified: 27 Jan 1999 01:12:44 GMT
>> This last line is invalid. Last-modified headers should have a format like:
>> Header line: Last-Modified: Mon, 2 Feb 1999 01:12:44 GMT
>> See http://www.pmg.lcs.mit.edu/cgi-bin/rfc/view?2068
>> So you say "I don't care if it's invalid, ht://Dig should be able to keep
>> going." Fair enough. But I'm beginning to worry about the complexity of
>> section of code if people keep finding non-compliant servers. There's a
>> reason for RFCs...
>> What should we do, decide that we'll give the current time to documents
>> servers that return poorly-formatted dates? That doesn't sound like a good
>> solution to me.
>Well, we already ignore bad weekdays, so why not allow missing weekdays
>too. Here's a patch to htdig-3.1.0dev-013199 to make getdate a bit
>I'd like people to try it out to make sure it works, especially on
>systems that have had problems with mystrptime/strftime in the past.
>Note that this patch won't work for 3.1.0b4, because of other changes to
>getdate() since that release. I'll post a patch for 3.1.0b4 separately.
>Please grab the one that is applicable to your source, or grab the latest
>snapshot and add this patch, and please let me know if this fixes the
>problems you've had, or breaks anything. I've walked through the code
>quite carefully, and tested it on my server, and I'm quite confident
>it works, but independent confirmation would be a plus, especially as
>we're very close to final release.
>--- htdig/Document.cc.datebug Tue Jan 26 18:27:21 1999
>+++ htdig/Document.cc Wed Feb 3 10:39:20 1999
>@@ -191,9 +191,9 @@
> Document::getdate(char *datestring)
>- String d = datestring;
> struct tm tm;
> time_t ret;
>+ char *s;
> // Two possible time designations:
>@@ -203,23 +203,29 @@
> // We strip off the weekday before sending to strptime
> // because some servers send invalid weekdays!
>+ // (Some don't even send a weekday, but we'll be flexible...)
>- int weekday_index = d.indexOf(',');
>- if (weekday_index > 3)
>- mystrptime(d.sub(weekday_index + 2), "%d-%b-%y %T", &tm);
>+ s = strchr(datestring, ',');
>+ if (s)
>- mystrptime(d.sub(weekday_index + 2), "%d %b %Y %T", &tm);
>- if (&tm != NULL) // We hope it isn't NULL!
>+ s = datestring;
>+ while (isspace(*s))
>+ if (strchr(s, '-') && mystrptime(s, "%d-%b-%y %T", &tm) ||
>+ mystrptime(s, "%d %b %Y %T", &tm))
>+ // correct for mystrptime, if %Y format saw only a 2 digit year
> if (tm.tm_year < 0)
> tm.tm_year += 1900;
> if (debug > 2)
>- cout << "Translated " << d << " to ";
>+ cout << "Translated " << datestring << " to ";
> char buffer;
>- strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", &tm);
>+ // Leave out %a for weekday, because we don't set it anymore...
>+ //strftime(buffer, sizeof(buffer), "%a, %d %b %Y %T", &tm);
>+ strftime(buffer, sizeof(buffer), "%d %b %Y %T", &tm);
> cout << buffer << " (" << tm.tm_year << ")" << endl;
> #if HAVE_TIMEGM
>@@ -230,6 +236,11 @@
>+ if (debug > 2)
>+ cout << "Cannot translate " << datestring <<
>+ ", using current time" << endl;
> ret = time(0); // This isn't the best, but it works. *fix*
> if (debug > 2)
>Gilles R. Detillieux E-mail: <firstname.lastname@example.org>
>Spinal Cord Research Centre WWW:
>Dept. Physiology, U. of Manitoba Phone: (204)789-3766
>Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>To unsubscribe from the htdig mailing list, send a message to
>email@example.com containing the single word "unsubscribe" in
>the SUBJECT of the message.
To unsubscribe from the htdig mailing list, send a message to
firstname.lastname@example.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Wed Feb 10 1999 - 17:09:05 PST