htdig: Htsearch does not exclude bad_words

John Lines (
Tue, 12 May 1998 18:31:59 +0100

Our documents which are indexed with htdig include some about the year 2000,
and one of my users did a search for 'year 2000' and was surprised not to
get anything back.

I suspect that htdig excludes pure numbers from the words it collects, and
so when he asked for 'year AND 2000' it didnt find anything - but that came
as a bit of a surprise to the user. Htsearch can also be prevented from
finding words which do really exist by including a word from the bad_words
list in the search, for example 'free will' (assuming a database of

As a suggestion for a future enhancement it would be good if htsearch could
identify that it was being asked to search for a noise word and either
silently discard it, or better, tell the user 'ignored search for 2000'.

        John Lines

