Geoff Hutchison (email@example.com)
Mon, 02 Aug 1999 11:10:59 -0400
Phooey! I knew I forgot to bring something in to work this morning. It
turns out I forgot to bring the list. I'll see if I can generate enough
of it from memory... These are in no particular order.
Goals Current Status
* Regex restrict/include/exclude Done
* Document DB keyed on DocID Done
* Document Excerpts moved to separate DB Incomplete (need compression
* Word DB conversion Incomplete (mostly in place, a few prob.)
* Regex fuzzy Incomplete
* Speling fuzzy Incomplete
* Transport rewrite Incomplete
ExternalTransport Not begun (need API)
* Trigram fuzzy Not begun (short)
* Generate a list of all documents Not begun (very short)
* HtTools Not begun (medium)
* UTF-8/Unicode support ?
* Character-Set translation ?
* Detection of duplicate documents while indexing Not begun (short)
* External Decoders ?
* Documentation / Website changes ?
* Distributed queries / Database collections ?
* Configuration changes ?
* URL weighting factors (e.g. server A gets 'boost') ?
indexing of URL text ?
* Search 'similar' ?
* Field-based searching (requires incomplete code)
* Phrase matching (requires incomplete code)
* Shared libraries for distinct functionalities ?
I'm going to reply to my own message in a minute with some commentary.
> . Implement new index structure (on db entry per word occurence, I can
> provide extended help on that)
This is what my WordList changes did. It needs some changes, but much of
the code is already committed. (Unless you have big changes I don't know
> . Implement db transparent compression (that what I'm doing, first release
> 4 August, benchmark results 5 August
Does this recognize already-compressed data? I didn't think about this
earlier, but the excerpts are *supposed* to be using the HtZlibCodec
since they're large enough to get significant benefit.
> . Upgrade to db-2.7.5
I don't really consider this a goal. Part of a move towards a release
entails updating code from external sources to the latest version. This
includes a variety of files in htlib/ from glibc as well as the db code.
By the time we have a 3.2 release, the versions will likely be
To unsubscribe from the htdig3-dev mailing list, send a message to
firstname.lastname@example.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Mon Aug 02 1999 - 07:30:13 PDT