Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 26 Jan 1999 15:28:29 -0600 (CST)
According to Geoff Hutchison:
> >Hmmm. That's strange, as htdig doesn't even look at the Content-length
> >header when retrieving from the HTTP server. It just reads until the
> >read() request returns 0 bytes (an EOF). Maybe this particular server,
> >at M.I.T. according to the bugs DB, wasn't closing the socket properly?
>
> That's what I was thinking. Since I don't have an actual address, I'm kinda
> stuck. I think our behavior should *avoid* the problem mentioned.
OK, how about the patch below?
> >> * htdig coredumps when calling strftime (PR#81)
> >which oddly has an address that's different than the mmap call - this leads
> >me to think that the memory corruption happened while processing the
> >zoneinfo file, so maybe he has a corrupt /usr/lib/zoneinfo/localtime?
>
> Now that's a good point. I could understand the prior problems when we got
> back a NULL and sent it on its way to blow up in our faces. But that's not
> happening (and I have a conditional to prevent it).
I was wondering about that conditional. If you declare "struct tm tm;",
then isn't &tm guaranteed to be non-NULL? The variable is automatically
allocated, so I don't see how its address could be NULL, regardless of
what mystrptime puts into the structure. On the other hand, checking
tm2, set by gmtime(), would make sense because it's a pointer. Mind you,
I don't think gmtime would ever return NULL either.
> >> * htsearch can coredump if a file in template_map doesn't exist
> >here. If the person who reported this problem can be persuaded to test
> >out the current snapshot or CVS tree, great, but otherwise I think this
> >problem is solved already.
>
> I would tend to agree here. I included the remark simply because I thought
> it needed another testing round before I was happy. I did that as well and
> it looks fine.
>
> >pattern would be wrong. I think this second usage should be changed over
> >to a separate attribute, e.g. remove_default_doc, which would be a string
> >list, and if empty, nothing would be removed. local_default_doc would
> >then revert to it's previous local_urls only function. E.g.:
>
> That about mirrors my thinking as well. I'd like to get Retriever to use a
> StringList, but it's not as easy as I'd like and I haven't had a chance to
> do it.
Yeah, handling multiple default documents for the local_urls stuff would
be a little trickier (though not much), because you'd need to test each
file name to see if it exists before going on to the next.
Here's my patch for the Content-Length header. What do you think?
--- ./htdig/Document.h.contlen Thu Dec 3 22:14:50 1998
+++ ./htdig/Document.h Tue Jan 26 14:24:59 1999
@@ -130,6 +130,7 @@
String contentType;
String authorization;
String referer;
+ int contentLength;
int document_length;
time_t modtime;
int max_doc_size;
--- ./htdig/Document.cc.contlen Mon Jan 18 16:58:35 1999
+++ ./htdig/Document.cc Tue Jan 26 14:47:37 1999
@@ -159,6 +159,7 @@
contents.allocate(max_doc_size + 100);
contentType = "";
+ contentLength = -1;
if (u)
{
Url(u);
@@ -193,6 +194,7 @@
Document::Reset()
{
contentType = 0;
+ contentLength = -1;
if (url)
delete url;
url = 0;
@@ -515,16 +517,20 @@
contents = 0;
char docBuffer[8192];
int bytesRead;
+ int bytesToGo = contentLength;
- while ((bytesRead = c.read(docBuffer, sizeof(docBuffer))) > 0)
- {
+ if (bytesToGo < 0 || bytesToGo > max_doc_size)
+ bytesToGo = max_doc_size;
+ while (bytesToGo > 0)
+ {
+ int len = bytesToGo<sizeof(docBuffer) ? bytesToGo : sizeof(docBuffer);
+ bytesRead = c.read(docBuffer, len);
+ if (bytesRead <= 0)
+ break;
if (debug > 2)
cout << "Read " << bytesRead << " from document\n";
- if (contents.length() + bytesRead > max_doc_size)
- bytesRead = max_doc_size - contents.length();
contents.append(docBuffer, bytesRead);
- if (contents.length() >= max_doc_size)
- break;
+ bytesToGo -= bytesRead;
}
c.close();
document_length = contents.length();
@@ -597,6 +603,12 @@
strtok(line, " \t");
modtime = getdate(strtok(0, "\n\t"));
}
+ else if (contentLength == -1
+ && mystrncasecmp(line, "content-length:", 15) == 0)
+ {
+ strtok(line, " \t");
+ contentLength = atoi(strtok(0, "\n\t"));
+ }
else if (mystrncasecmp(line, "content-type:", 13) == 0)
{
strtok(line, " \t");
@@ -676,6 +688,7 @@
}
fclose(f);
document_length = contents.length();
+ contentLength = document_length;
if (debug > 2)
cout << "Read a total of " << document_length << " bytes\n";
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST