Re: [htdig] We are adding MD5 and reverse indexing


Geoff Hutchison (ghutchis@wso.williams.edu)
Fri, 17 Sep 1999 12:18:48 -0400 (EDT)


On Sat, 18 Sep 1999, geoff s. wrote:

> 1. MD5 hashing for DB key and duplication detection (esp useful for email)

Sounds good to me. I was just about to begin work on this particular
feature, so if you already have code, I'll spend my time elsewhere.

> 2. Conversion from DB2 to mySQL (this is likely but not yet definite)

Please don't "convert." Instead if you could just write a subclass of
Database.cc to handle the same methods as the Berkeley DB classes, we can
start working on real abstraction--people will chose whatever database
they want.

> 3. Proximity searches (ie word x within n words of word y)

I'm assuming you're doing this based on the 3.2 code and extending the
existing phrase searching?

> And, most importantly, ability to search from item 4 to find matching files
> and a lot of code tidying up and support for LZW decoding in PDF, plus
> on-the-fly annotations in 4 as and when people find useful stuff. We are
> using it, amongst other things, for litigation support.

Again, I assume you're using the 3.2 code and restricting based on the
flags that the word has?

> I hope our humble contribution finds some fans and the code tree. It is so
> much easier to follow pioneers that to be one, and I'm eternally grateful to
> the folks who kicked htDig off in the first place.
>
> Does anyone have any ideas on this ?

Without seeing code, I can't promise it will land in the tree, but all of
this would be welcome and sounds wonderful. As for ideas, there is no
shortage of them, so if you'd like to discuss it further, let's take it up
on the htdig3-dev list.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Sep 17 1999 - 09:22:38 PDT