[htdig] We are adding MD5 and reverse indexing


geoff s. (geoff@quanta.paypc.com)
Sat, 18 Sep 1999 01:07:46 +1000


Be advised that we are modifying htDig as follows:-

1. MD5 hashing for DB key and duplication detection (esp useful for email)
2. Conversion from DB2 to mySQL (this is likely but not yet definite)
3. Proximity searches (ie word x within n words of word y)
4. File annotations, subject, author, file format etc

And, most importantly, ability to search from item 4 to find matching files
and a lot of code tidying up and support for LZW decoding in PDF, plus
on-the-fly annotations in 4 as and when people find useful stuff. We are
using it, amongst other things, for litigation support.

Pity it is C++, but we hope to upload the above in the next 10 days ("...it
was announced today that htDig 2000 might be delayed for new enhanced
features, blah blah :-)))))"

I hope our humble contribution finds some fans and the code tree. It is so
much easier to follow pioneers that to be one, and I'm eternally grateful to
the folks who kicked htDig off in the first place.

Does anyone have any ideas on this ?

Greetings from Brisbane, Australia gateway to Brisbane, California (just out
of San Franscisco).

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Sep 17 1999 - 08:11:43 PDT