Re: [htdig] Improving quality of AND search results


plucas@frost.com
Mon, 17 May 1999 15:36:53 -0700


I may be missing something really obvious here but surely documents B and C
should not show up at all in the results of an "AND" search if they do not
contain any occurrences of one of the search terms.

Paul Lucas
Frost & Sullivan

"Garret W. Gengler" <garretg@otable.com> on 05/17/99 02:21:40 PM

To: htdig@htdig.org
cc: (bcc: Paul Lucas/Electronic Delivery -
      FSCA/Mountain_View_CA/US/Frost & Sullivan)

Subject: [htdig] Improving quality of AND search results

I'm looking for a way to improve the quality of htdig search results as
follows...

If I do an "AND" search of a document htdig appears to rate documents by
the total count of any of the three keywords. I'd like htdig to give the
highest score to a document that contains every keyword, even if it just
contains one of each... then after that, it can start using the keyword
count method.

Here's an example... an AND search with the keywords "composite material
skeleton"...

Document A:
"composite" occurs 1 time
"material" occurs 2 times
"skeleton" occurs 1 time

Document B:
"composite" occurs 15 times
"material" occurs 2 times
"skeleton" never occurs

Document C:
"composite" occurs 4 times
"material" occurs 5 times
"skeleton" never occurs

In this example, I'd like Document A to get the highest score, then B, then
C... Currently, htdig returns rates B highest, then C, then A.

Are there any configuration directives that might help me to adjust this
behavior?

-Garret Gengler
 RoundTable Media
 garretg@otable.com

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon May 17 1999 - 16:01:02 PDT