Re: [htdig3-dev] Berkeley DB2 and Perl scripts


Subject: Re: [htdig3-dev] Berkeley DB2 and Perl scripts
From: Tom Metro (tmetro@vl.com)
Date: Sat Dec 04 1999 - 08:27:33 PST


Geoff Hutchison <ghutchis@wso.williams.edu> writes:
> But to clarify your point, zlib and u_p_a and c_u_p are used on
> different things. The first is used *solely* on document excerpts
> (the DocHead field), while the latter two are used on URLs in both
> the document database and the document index (the URL->DocID list).
I should probably do more research before asking further questions, as
I don't even know what your database schema looks like (one of the
reasons why I suggested having an "architectural overview" document),
but are document titles compressed using one of these schemes? Is it
part of the DocHead that is compressed with zlib?

> So there are two steps to decoding an entry--first decoding based on
> url_part_aliases and common_url_parts, then decompressing the
> DocHead field if it's compressed.
Well if one goal is to just get a report of indexed URLs, I can
probably forgo dealing with zlib.

Decoding common_url_parts reliably with an external script will be
tricky because even if you parse htdig.conf and find it absent, you
still need to keep in sync with htdig's compiled-in defaults, which
may have changed since the script was written.

Are you aware of any Perl scripts that have been written to decompress
the URLs?

 -Tom

-- 
Tom Metro
Venture Logic                                     tmetro@vl.com
Newton, MA, USA

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sat Dec 04 1999 - 09:29:02 PST