Re: [htdig3-dev] size of dynamic pages


Subject: Re: [htdig3-dev] size of dynamic pages
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Feb 09 2000 - 11:51:38 PST


According to Gabriele Bartolini:
> >Also, ReadBody() and ReadChunkedBody() are inconsistent in the way they
> >set _response._content_length & _response._document_length. ReadBody()
> >doesn't even make sure _response._content_length gets set. They should
> >both end with something like:
> >
> > _response._document_length = _response._contents.length();
> >
>
> You mean, maybe:
>
> > if (_response._content_length < _response._document_length)
> > _response._content_length = _response._document_length;
>
> And why?

At the time I suggested that, it wasn't clear to me that there would
never be a content-length header when reading chunked input. I see
now that this is the case. However, the _content_length field should
reflect the total size of the original document, regardless of whether
htdig truncated it to max_doc_size, while _document_length should
reflect the size actually read it, not exceeding max_doc_size.

If _content_length is less than _document_length, then _content_length
is surely incorrect, or wasn't given, as it should always be greater than
or equal to _document_length. So, I think the logic above still makes
sense. Certainly for non-chunked input, that's the way it should be
done. For chunked input, you may want to do it differently, but you
do have to allow for the possibility that _content_length will be larger
than _document_length (see below).

> Well, as far as ReadChunkedBody() is, there's another problem. We don't use
> max_document_size attribute, and I have no idea on how to use it (but
> closing the connection as the size has been reached -- in this way I'll
> never know how much the content-length is). That sucks ;-(
>
> Any ideas?

Yes, it occurred to me late yesterday that this was a problem. It seems
to me that the only option is to read the entire chunked input, to get
the correct length, but only append a maximum of max_doc_size bytes to
the _contents string. Then, _document_length will be set to the length
of this string (i.e. it will not exceed max_doc_size), but _content_length
will be set to the total length of all chunks.

...
> I found the answer ... and I hope I'll get to be clear in the explanation.
...

That really clarified things. Thanks for the research.

> << All HTTP/1.1 applications that receive entities MUST accept the
> "chunked" transfer-coding, thus allowing this mechanism to be used for
> messages when the message length cannot be determined in advance. >>
>
> Is htdig an HTTP/1.1 applications? I suppose yes ...

Yes, that was our goal, so obviously if we want to support HTTP/1.1,
we must accept chunked input - which we do - but we must also make sure
max_doc_size is respected, even if that means reading and discarding
any extra output from the server.

> And finally let's get to the point:
>
> << Messages MUST NOT include both a Content-Length header field and a
> non-identity transfer-coding. If the message does include a non-identity
> transfer coding, the Content-Length MUST be ignored. >>
>
> Any assesments and considerations?

That's pretty clear! So, when reading chunked input, we must calculate
the total length of all chunks (whether kept or not), and ignore/override
any Content-Length header we came across.

> P.S.: There's an obscure side regarding Chunks, and concerns the Trailer of
> the message. Maybe the server can set some header fields here. I remember I
> did some attempts and I always got an empty trailer ... Probably you know
> better than me and you can help me.

It's not clear to me what these "entity-headers" in the trailer are, and
what we should do with them. Are they to be treated just like headers
received before the data? If so, I guess we should parse them. In any
case, it would seem the code should be there to deal with the trailer,
if any server puts one out, even if we don't have any examples right
now of servers that do.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Feb 09 2000 - 11:54:31 PST