BOUNCE htdig: Admin request


owner-htdig@sdsu.edu
Fri, 4 Dec 1998 14:13:52 -0800 (PST)


>From andrew@contigo.com Fri Dec 4 14:13:49 1998
Received: from cliff.scrc.umanitoba.ca (grdetil@cliff.scrc.umanitoba.ca [140.193.8.125])
        by sdsu.edu (8.8.7/8.8.7) with ESMTP id OAA06695
        for <htdig@sdsu.edu>; Fri, 4 Dec 1998 14:13:46 -0800 (PST)
Received: (from grdetil@localhost)
        by cliff.scrc.umanitoba.ca (8.8.5/8.8.5) id QAA14126;
        Fri, 4 Dec 1998 16:13:00 -0600
From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
Message-Id: <199812042213.QAA14126@cliff.scrc.umanitoba.ca>
Subject: Re: htdig: Phrase searches
To: robbie@tfs.net
Date: Fri, 4 Dec 1998 16:13:00 -0600 (CST)
Cc: htdig@sdsu.edu
In-Reply-To: <3667e1fe.1d1e.0@tfs.net> from "robbie@tfs.net" at Dec 4, 98 07:22:06 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

According to robbie@tfs.net:
> I'm hoping one of you can help me out. My problem is with
> multiword searches. For example, I want to search on "trash
> cans". Besides getting documents with "trash cans", I'm getting
> results that include "trash" and "cans" in the document, but not
> together. Basically, I want the ability to have a search
> exactly as entered, and not with the words scattered around the
> document. Is it possible to to an exact search?

Phrase searches are a much-requested feature, which is unfortunately
not implemented in ht://Dig (yet). There's been some discussion on this
list lately about how to do this, but the concern is that it would either
require a major redesign, and much larger databases as a result.

Some have suggested a more kludgey approach, using the existing database
to do an "AND" search on the words in the phrase, and priorizing the
results based on the proximity of the search keywords in each matched
document. The locations of words are stored in the database, so this
may be feasible - provided the location of all occurrences of a word,
and not just the first occurrence, is recorded. Another approach would
be to search through the document excerpt for the phrase, after each
matching document is found by keyword search.

In any case, the feature isn't trivial to implement, and so far, I don't
think there have been any takers. I guess whoever needs this feature
most, and knows C++, has the job! :-)

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:48 PST