[htdig3-dev] Re: [htdig3-dev] Re: Zlib compression


Didier Gautheron (dgautheron@magic.fr)
Thu, 21 Jan 1999 03:07:23 +0000


Gilles Detillieux wrote:
>
> * List: htdig3-dev@sob.htdig.org
>
> According to Geoff Hutchison:
> > On Tue, 19 Jan 1999, Gilles Detillieux wrote:
> > > I think I can speculate on the 2nd question. For every href to a given
> > > URL, htdig will fetch, modify and store the DocumentRef for that URL.
> > > That means a Deserialize and a Serialize for each href, plus one for
> > > the document itself.
> >
> > So this is a side-effect of the AddDescription? I wonder if there's a way
> > we can only do the Deserialize/Serialize when we're actually adding the
> > description.
> >
> > Or, as Didier points out, we can only compress parts of the DocumentRef.
> > This would escape some of the slowdown in deflate(). In other words, maybe
> > we compress DocHead. Then we have the methods to access DocHead to the
> > compression/decompression *only* when DocHead is needed.
>
> That sounds reasonable to me. I'd bet that the other fields are too small
> to get decent compression anyway, but I may be wrong. In any case, if we
> only compress/decompress the DocHead as needed, that would greatly cut
> down the number of times we'd need to do that (once per document).
>
> > > I'd guess Didier's site is averaging 42 hrefs per URL, though that still
> > > seems rather high!
> >
> > That was my assumption--that there's too many calls to be readily
> > explained. Maybe we can figure out some sort of debugging trace for calls
> > to [] (which will then go to Deserialize).
>
> Well, that depends on the files he's indexing. E.g. cf_byname.html has
> way more than 42 hrefs to attrs.html. Maybe Didier can comment on this.
> Is the number of Serialize/Deserialize calls way too high for the stuff
> he's indexing? If so, then yes, some debugging traces would be in order.
It's java, apache, vnc, python, html, hylafax, lesstiff and so on docs
(that's many hrefs!).
Attach exerpt from gprof and for me it looks fine.

Didier

-----------------------------------------------
                0.00 0.00 1/73153 Retriever::got_redirect(char *, DocumentRef *) [189]
                0.00 0.12 1743/73153 Retriever::GetRef(char *) [98]
                0.07 4.81 71409/73153 Retriever::got_href(URL &, char *) [8]
[12] 8.0 0.07 4.93 73153 DocumentDB::operator[](char *) [12]
                0.88 2.92 71410/71410 DocumentRef::Deserialize(String &) [15]
                0.09 0.62 71410/73153 DocumentRef::DocumentRef(void) [55]
                0.08 0.10 73153/73154 DB2_db::Get(String &, String &) [90]
                0.12 0.00 73153/1273496 String::lowercase(void) [22]
                0.04 0.04 73153/2196803 String::String(char *) [18]
                0.04 0.00 146306/5789509 String::~String(void) [36]
-----------------------------------------------
                0.00 0.00 1/73153 Retriever::got_redirect(char *, DocumentRef *) [189]
                0.00 0.11 1743/73153 Retriever::parse_url(URLRef &) [4]
                0.12 4.67 71409/73153 Retriever::got_href(URL &, char *) [8]
[13] 7.9 0.12 4.78 73153 DocumentDB::Add(DocumentRef &) [13]
                0.66 3.84 73153/73153 DocumentRef::Serialize(String &) [14]
                0.12 0.00 73153/1273496 String::lowercase(void) [22]
                0.05 0.02 73153/73154 DB2_db::Put(String &, String &) [120]
                0.05 0.01 146306/3359965 String::operator=(char *) [41]
                0.02 0.00 73153/5789509 String::~String(void) [36]
                0.01 0.00 73153/7344653 String::get(void) const [43]
-----------------------------------------------
                0.66 3.84 73153/73153 DocumentDB::Add(DocumentRef &) [13]
[14] 7.2 0.66 3.84 73153 DocumentRef::Serialize(String &) [14]
                0.95 1.24 2778846/6814986 String::append(char *, int) [11]
                0.69 0.37 2272192/2272207 String::append(String &) [47]
                0.43 0.00 2246839/2897300 List::Get_Next(void) [66]
                0.15 0.00 627234/8303469 String::append(char) [24]
-----------------------------------------------
                0.88 2.92 71410/71410 DocumentDB::operator[](char *) [12]
[15] 6.1 0.88 2.92 71410 DocumentRef::Deserialize(String &) [15]
                0.76 0.99 2215434/6814986 String::append(char *, int) [11]
                0.02 0.56 71410/144563 DocumentRef::Clear(void) [45]
                0.53 0.00 2099569/2153374 List::Add(Object *) [67]
                0.04 0.01 115865/3359965 String::operator=(char *) [41]
                0.01 0.00 71410/7344653 String::get(void) const [43]
-----------------------------------------------



This archive was generated by hypermail 2.0b3 on Thu Feb 04 1999 - 22:24:20 PST