Re: htdig: Preliminary proposal for data structures to support phrases


Colin Viebrock (cmv@privateworld.com)
Mon, 08 Jun 1998 13:39:49 -0400


I have no experience with htdig, other than using it, but I know a bit
about databases. :)

What about something like this:

CREATE table words (
        wordID int PRIMARY KEY, // a unique id for each word
        word varchar(<wordlength>) // the words
);

CREATE table references (
        wordID int, // refernence to the word table
        docID int, // reference to the document
        location int, // position within the document
);

There would be no primary keys in the references table, but you could
create keys (or indices) on the wordID and docID columns.

Searching algorithms seem pretty straightfoward now:

- Doing a quick search on the words table will tell you if those words even
exist.
- For a phrase search, you'd check that the first word has a location of x,
that the second word has a location of x+1, etc., all with the same docID.
- For a near search, you'd check that the first word has a location of x,
that the second word has a location between x-5 and x+5 (or however close),
both with the same docID.
- For a before/after search, you'd check that the first word has a location
of x, that the second word has a location less than or greater than x.

Hope these have been useful comments.

.........................................................................
Colin Viebrock Creative Director - Private World Communciations
cmv@privateworld.com 331 - 67 Mowat Avenue
http://www.privateworld.com Toronto, Ontario, CANADA, M6K 3E3
ICQ: 11386088

                                           "Duct tape is like the force.
                                   It has a light side, and a dark side,
                                    and it holds the universe together."
                                                          - Carl Zwanzig
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:32 PST