Re: htdig: Preliminary proposal for data structures to support phrases

Edmond Abrahamian (
Mon, 8 Jun 1998 19:03:01 +0300 (EET DST)

On Tue, 9 Jun 1998, jmoore wrote:

> The main problem with this approach as outlined, is that the index will be
> at least 3 times the size of the collected documents since the previous
> and next word is stored for each word. There are probably a lot of
> optimizations that can happen here - the first is to use 2 byte short ints

Hi Jason,

   It would seem to me that a more efficient approach would be to store
the offset of each word from some common reference point, say the beginning
of each document. That way, storage requirements would be O(n) i.e. on the
order of the number of words, and you can look up words in any combination
(an added feature for htfuzzy?).

  -- Edmond

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:31 PST