[htdig3-dev] State of the Code


Geoff Hutchison (ghutchis@wso.williams.edu)
Sun, 30 May 1999 23:26:36 -0400


Hi,

Well I was planning on sending this out 6/1, but I have some free time now.
This will be long, but fairly important, so if you don't have time to read
it now, please set it aside until you have a few moments. But please read
it!

I want to make a quick summary of project status. I'll be sending out a
more user-oriented summary to the htdig list, so this is more code-focused.
I'm doing this in part because I'd like some feedback on how *I'm* doing
and ways I can be more effective.

Current Status:
* htdig-3.1.x: Congrats to everyone (and another big thank you to Gilles
for his work on 3.1.2) on what seems to be a very solid release. Depending
on progress with 3.2, we may wish to backport some finished stuff (e.g.
HtRegex) to this, but hopefully we won't need anything beyond a possible
3.1.3.

* website: Thanks to everyone, especially Benjamin, who volunteered to help
out with the website. I think the outline for reorganized documentation
from Marjolein is excellent and will hopefully cut down on some of the FAQ
to the mailing list. More feedback is obviously welcome since I still feel
like I'm sending a lot of RTFM messages to the list (and I know others like
Torsten and Gilles are doing likewise). Either Benjamin or myself will have
more on the logo contest shortly as far as voting. However, we *do* need
more templates!

* developer documentation: I've received a few requests for "developer
documentation." That is, documentation to help smooth out the adjustment
curve so people can help out more. Fair 'nuff, I'll be glad to write stuff,
especially if it helps people be more productive. What kinds of things are
needed? Would a list of "small projects" as stepping-stones for new
contributors be useful?

* snapshots: I'm reworking the way snapshots are generated. Right now
they're done off of a machine here at Williams and uploaded via CVS. But
I'm going to take them out of CVS to decrease the disk space on the
htdig.org server (and because it seems silly to have them in CVS when you
can get the source itself via CVS). I'll probably move to keeping only the
latest 2-4 snapshots as well. I'm also wondering if I should turn off
weekly snapshots, but see the next point.

* database rewrite: Basically nothing has happened since Hans-Peter dropped
in the change to index the documents on DocID. I'll be putting up a change
to take the DocHead excerpts out of the main db.docdb, but I'm not sure
it's as "elegant" as I'd like yet. On that subject, I also have the
beginnings of a rewrite to the word db. *BUT* I'm not sure we can easily
rewrite it in one fell-swoop. So would it be better to turn off weekly
snapshots and work away at projects like this that may take some time to
complete? It seems like everyone's very busy right now, so would it be more
productive to work in smaller chunks? On the down side, we wouldn't be able
to test some changes for a while.

* speed: I recently switched the wso.williams.edu site back to 3.1.2 from
the latest 3.2.0 code. Indexing sped up considerably--perhaps due to disk
overhead with Hans-Peter's changes. I still think they're the "right thing
TM" since they speed up searching. But this 3.2 "penalty" should decrease
when Gabriele and I work out some problems with his new HTTP/1.1 code. On
the searching side of things, I picked up a new book with some useful
tricks and I think we can speed searches up by at least a factor of 4,
perhaps more for larger sites with large responses.

* major projects: Beyond what I've mentioned, the following tasks for 3.2
haven't even been started (I'm just stating this for the record, I haven't
touched them either!). Contact me for more info.
  * ExternalProtocol: URL rewrite almost complete, need an API
  * ExternalDecoder: Needs a way to detect MIME type before it starts
  * In18n: Need to start in on String interface first--make it look like Java?
  * Collections: Merged databases are OK, but do we need to merge them?

I've probably left things out, but this is long enough. Let me know about
developer docs, snapshots, project listings, and the idea of putting aside
a working system for some short period while we complete tasks.

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sun May 30 1999 - 19:40:21 PDT