[htdig3-dev] Re: Zlib compression


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 19 Jan 1999 15:43:09 -0600 (CST)


* List: htdig3-dev@sob.htdig.org

According to Geoff Hutchison:
> On Tue, 19 Jan 1999, Gilles Detillieux wrote:
> > I think I can speculate on the 2nd question. For every href to a given
> > URL, htdig will fetch, modify and store the DocumentRef for that URL.
> > That means a Deserialize and a Serialize for each href, plus one for
> > the document itself.
>
> So this is a side-effect of the AddDescription? I wonder if there's a way
> we can only do the Deserialize/Serialize when we're actually adding the
> description.
>
> Or, as Didier points out, we can only compress parts of the DocumentRef.
> This would escape some of the slowdown in deflate(). In other words, maybe
> we compress DocHead. Then we have the methods to access DocHead to the
> compression/decompression *only* when DocHead is needed.

That sounds reasonable to me. I'd bet that the other fields are too small
to get decent compression anyway, but I may be wrong. In any case, if we
only compress/decompress the DocHead as needed, that would greatly cut
down the number of times we'd need to do that (once per document).

> > I'd guess Didier's site is averaging 42 hrefs per URL, though that still
> > seems rather high!
>
> That was my assumption--that there's too many calls to be readily
> explained. Maybe we can figure out some sort of debugging trace for calls
> to [] (which will then go to Deserialize).

Well, that depends on the files he's indexing. E.g. cf_byname.html has
way more than 42 hrefs to attrs.html. Maybe Didier can comment on this.
Is the number of Serialize/Deserialize calls way too high for the stuff
he's indexing? If so, then yes, some debugging traces would be in order.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:19 PST