[htdig3-dev] Patch for Document Sizes


Geoff Hutchison (ghutchis@wso.williams.edu)
Sun, 14 Feb 1999 16:43:32 -0400


Hi,

Marjolein noted a bug in the Document code. If you do a search on
htdig.org, you can see it in action. Search for any attribute, say
pdf_parser and look at the results for attrs.html. The document's size is
reported as max_doc_size when the document has been trimmed. In this case,
attrs.html is reported as 100K, when it's 155+K.

I'm not sure this is the best fix, but it seems to work. The document size
is now reported as the size sent by the server (if available) or by stat()
when retrieving locally. In particular, I don't know much about the library
calls -- is st_size a field of all stat types?

-Geoff

Index: Document.cc
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdig/Document.cc,v
retrieving revision 1.34
diff -r1.34 Document.cc
447a448,450
>
> if (document_length < contentLength)
> document_length = contentLength;
598,599c601,602
< document_length = contents.length();
< contentLength = document_length;

---
>     document_length = stat_buf.st_size;
>     contentLength = contents.length();

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sun Feb 14 1999 - 13:59:42 PST