[htdig] RE: [3.2.0b2] AND operator not working as it should?


Subject: [htdig] RE: [3.2.0b2] AND operator not working as it should?
From: Arthur Prokosch (prokosch@aptima.com)
Date: Wed Aug 09 2000 - 09:31:16 PDT


Hi, all. I spent a while with other folks on htdig-dev trying to figure out
what was going on with the AND operator (and phrase matching).

If you haven't been following the thread, then briefly, let's say you have
one word, call it 'apple', that appears in at least one of the indexed
documents, and another word, call it 'banana', that doesn't appear anywhere
in any of your indexed documents. At least with the beta 2 and later
snapshots so far, searching for 'apple AND banana' would return the results
for 'apple' (when it should return 0). Additionally, if you searched for
the phrase '"banana apple"', you'd get the results for apple as well
(although a search for '"apple banana"' would correctly return 0 results).

Someone on htdig-dev found part of the code that was causing this problem,
and I put together a quick patch that seems to correct these problems and
not cause new ones (but no guarantees). If, like me, you're trying
desperately to make a near-production-quality application using phrase
searching with a beta version of htdig - or if you're just playing with the
latest beta - the following may be useful. The patch also seems to correct
an endless loop caused by a query string with an unterminated quote.

Also, this email address will soon disappear; if you have any questions
about this, please mail me at <app@pobox.com>. Thanks!

arthur.

----

diff -up htsearch/parser.cc-3.2.0b2 htsearch/parser.cc --- htsearch/parser.cc-3.2.0b2 Tue Apr 11 18:53:21 2000 +++ htsearch/parser.cc Mon Aug 7 14:18:54 2000 @@ -171,6 +171,7 @@ Parser::phrase(int output) { List *wordList = new List; double weight = 1.0; + int skipRest = 0;

while (1) { @@ -183,11 +184,21 @@ Parser::phrase(int output) else if (lookahead == WORD) { weight *= current->weight; - if (output) - perform_phrase(*wordList); - + if (output && !skipRest) + { + perform_phrase(*wordList); + if (wordList->Count() == 0) + // just the start of the phrase has no results => skipRest + skipRest = 1; + } + lookahead = lexan(); } + else if (lookahead == DONE) + { + setError("'\"'"); + break; + }

} // end while delete wordList; @@ -352,8 +363,8 @@ Parser::score(List *wordList, double wei

if (!wordList || wordList->Count() == 0) { - // We can't score an empty list, so this should be ignored... - list->isIgnore = 1; + // We can't score an empty list, so stop here + // (setting isIgnore as well would cause errors with AND) return; }

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Aug 08 2000 - 23:30:11 PDT