Re: [htdig3-dev] Re: What's left for 3.1.0


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 26 Jan 1999 15:28:29 -0600 (CST)


According to Geoff Hutchison:
> >Hmmm. That's strange, as htdig doesn't even look at the Content-length
> >header when retrieving from the HTTP server. It just reads until the
> >read() request returns 0 bytes (an EOF). Maybe this particular server,
> >at M.I.T. according to the bugs DB, wasn't closing the socket properly?
>
> That's what I was thinking. Since I don't have an actual address, I'm kinda
> stuck. I think our behavior should *avoid* the problem mentioned.

OK, how about the patch below?

> >> * htdig coredumps when calling strftime (PR#81)
> >which oddly has an address that's different than the mmap call - this leads
> >me to think that the memory corruption happened while processing the
> >zoneinfo file, so maybe he has a corrupt /usr/lib/zoneinfo/localtime?
>
> Now that's a good point. I could understand the prior problems when we got
> back a NULL and sent it on its way to blow up in our faces. But that's not
> happening (and I have a conditional to prevent it).

I was wondering about that conditional. If you declare "struct tm tm;",
then isn't &tm guaranteed to be non-NULL? The variable is automatically
allocated, so I don't see how its address could be NULL, regardless of
what mystrptime puts into the structure. On the other hand, checking
tm2, set by gmtime(), would make sense because it's a pointer. Mind you,
I don't think gmtime would ever return NULL either.

> >> * htsearch can coredump if a file in template_map doesn't exist
> >here. If the person who reported this problem can be persuaded to test
> >out the current snapshot or CVS tree, great, but otherwise I think this
> >problem is solved already.
>
> I would tend to agree here. I included the remark simply because I thought
> it needed another testing round before I was happy. I did that as well and
> it looks fine.
>
> >pattern would be wrong. I think this second usage should be changed over
> >to a separate attribute, e.g. remove_default_doc, which would be a string
> >list, and if empty, nothing would be removed. local_default_doc would
> >then revert to it's previous local_urls only function. E.g.:
>
> That about mirrors my thinking as well. I'd like to get Retriever to use a
> StringList, but it's not as easy as I'd like and I haven't had a chance to
> do it.

Yeah, handling multiple default documents for the local_urls stuff would
be a little trickier (though not much), because you'd need to test each
file name to see if it exists before going on to the next.

Here's my patch for the Content-Length header. What do you think?

--- ./htdig/Document.h.contlen Thu Dec 3 22:14:50 1998
+++ ./htdig/Document.h Tue Jan 26 14:24:59 1999
@@ -130,6 +130,7 @@
     String contentType;
     String authorization;
     String referer;
+ int contentLength;
     int document_length;
     time_t modtime;
     int max_doc_size;
--- ./htdig/Document.cc.contlen Mon Jan 18 16:58:35 1999
+++ ./htdig/Document.cc Tue Jan 26 14:47:37 1999
@@ -159,6 +159,7 @@
 
     contents.allocate(max_doc_size + 100);
     contentType = "";
+ contentLength = -1;
     if (u)
     {
         Url(u);
@@ -193,6 +194,7 @@
 Document::Reset()
 {
     contentType = 0;
+ contentLength = -1;
     if (url)
       delete url;
     url = 0;
@@ -515,16 +517,20 @@
     contents = 0;
     char docBuffer[8192];
     int bytesRead;
+ int bytesToGo = contentLength;
 
- while ((bytesRead = c.read(docBuffer, sizeof(docBuffer))) > 0)
- {
+ if (bytesToGo < 0 || bytesToGo > max_doc_size)
+ bytesToGo = max_doc_size;
+ while (bytesToGo > 0)
+ {
+ int len = bytesToGo<sizeof(docBuffer) ? bytesToGo : sizeof(docBuffer);
+ bytesRead = c.read(docBuffer, len);
+ if (bytesRead <= 0)
+ break;
         if (debug > 2)
             cout << "Read " << bytesRead << " from document\n";
- if (contents.length() + bytesRead > max_doc_size)
- bytesRead = max_doc_size - contents.length();
         contents.append(docBuffer, bytesRead);
- if (contents.length() >= max_doc_size)
- break;
+ bytesToGo -= bytesRead;
     }
     c.close();
     document_length = contents.length();
@@ -597,6 +603,12 @@
                 strtok(line, " \t");
                 modtime = getdate(strtok(0, "\n\t"));
             }
+ else if (contentLength == -1
+ && mystrncasecmp(line, "content-length:", 15) == 0)
+ {
+ strtok(line, " \t");
+ contentLength = atoi(strtok(0, "\n\t"));
+ }
             else if (mystrncasecmp(line, "content-type:", 13) == 0)
             {
                 strtok(line, " \t");
@@ -676,6 +688,7 @@
     }
     fclose(f);
     document_length = contents.length();
+ contentLength = document_length;
 
     if (debug > 2)
         cout << "Read a total of " << document_length << " bytes\n";

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST