Re: [htdig] no result searching for string with double colon


Subject: Re: [htdig] no result searching for string with double colon
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Fri Nov 03 2000 - 06:38:05 PST


According to Daniel Naber:
> we're using htdig 3.1.5 on http://bugs.kde.org. Someone noted that the
> search for "QButton::toggled" didn't give any results, but "QButton
> toggled" (using AND) matches many pages (which then include the string
> "QButton::toggled"). The message is "No matches were found for
> 'qbutton::toggled'".
>
> So on the one hand, the indexer doesn't seem to index colons (that's
> okay), but on the other hand the colons are not removed by htsearch. IMHO
> both normalisations should be the same to avoid confusion.
>
> BTW: "QButtontoggled" also yields no results. I did not modify the default
> values of extra_word_characters or valid_punctuation. I might be able to
> work around the problem by changing extra_word_characters, but it might be
> a bug in htdig nevertheless.

Actually, htsearch does break up words in essentially the same way as
htdig, with one exception - the colon. It's not clearly documented,
even in the code, but htsearch allows the modifiers "exact:" and
"hidden:" to be prepended to search words, and if there's a colon but
not one of these modifiers, the colon is unfortunately assumed to be
part of the word. I'm not certain exactly how or why these modifiers
are used, but I'd hazzard a guess that you can probably safely modify the
setupWords() function in htsearch/htsearch.cc (around lines 399-404 in an
unpatched 3.1.5 installation) not to use the colon as part of the word.
Perhaps Geoff or someone else more familiar with the whole WeightWord
class and how it's used can comment.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Nov 03 2000 - 06:44:32 PST