Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 19 Jan 1999 15:43:09 -0600 (CST)
* List: htdig3-dev@sob.htdig.org
According to Geoff Hutchison:
> On Tue, 19 Jan 1999, Gilles Detillieux wrote:
> > I think I can speculate on the 2nd question. For every href to a given
> > URL, htdig will fetch, modify and store the DocumentRef for that URL.
> > That means a Deserialize and a Serialize for each href, plus one for
> > the document itself.
>
> So this is a side-effect of the AddDescription? I wonder if there's a way
> we can only do the Deserialize/Serialize when we're actually adding the
> description.
>
> Or, as Didier points out, we can only compress parts of the DocumentRef.
> This would escape some of the slowdown in deflate(). In other words, maybe
> we compress DocHead. Then we have the methods to access DocHead to the
> compression/decompression *only* when DocHead is needed.
That sounds reasonable to me. I'd bet that the other fields are too small
to get decent compression anyway, but I may be wrong. In any case, if we
only compress/decompress the DocHead as needed, that would greatly cut
down the number of times we'd need to do that (once per document).
> > I'd guess Didier's site is averaging 42 hrefs per URL, though that still
> > seems rather high!
>
> That was my assumption--that there's too many calls to be readily
> explained. Maybe we can figure out some sort of debugging trace for calls
> to [] (which will then go to Deserialize).
Well, that depends on the files he's indexing. E.g. cf_byname.html has
way more than 42 hrefs to attrs.html. Maybe Didier can comment on this.
Is the number of Serialize/Deserialize calls way too high for the stuff
he's indexing? If so, then yes, some debugging traces would be in order.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:19 PST