Re: htdig: Re: htdig 4

Leslie Mikesell (les@Mcs.Net)
Thu, 28 May 1998 14:02:00 -0500 (CDT)

According to Andrew Scherpbier:
> You may or may not like it...

> Initially, it will use JDBC to talk to whatever database you want to use.
> This may change, however. There are some limitations to SQL databases that
> make then inefficient and inflexible for the tasks of a search engine; notably
> support for variable length records.

Most of them actually do support a text/varchar and/or blob types. The
tricky part is making an index do what you want for raw text searches.
Have you considered extending postgresql with the index needed?

> It will run either as a separate service that a small CGI program can talk to
> or as a set of servlets on a webserver that support servlets. This should get
> around any problems with slow Java startup speed.

Perhaps if you document the database query someone else will do a perl
DBI version that might be better under mod_perl.

> With a complete redesign, lots of things that have been requested will be
> included.

Are you planning 'phrase' search capability? I've wondered if you could
do this by storing the position(s) of each word in the document along
with the word, then after you assemble the list of word matches, discard
the ones which aren't in the right sequence. Being able to add new
items in real time is about the only other thing on my wish list.

  Les Mikesell

