Re: htdig: Htsearch does not exclude bad_words


J.E.J. op den Brouw (jesse@crytonII.st.hhs.nl)
Wed, 13 May 1998 10:00:25 +0200


Did you use the configuration option: allow_numbers !!!
If you use this options numbers will be indexed and can be searched for

John Lines wrote:
>
> Our documents which are indexed with htdig include some about the year 2000,
> and one of my users did a search for 'year 2000' and was surprised not to
> get anything back.
>
> I suspect that htdig excludes pure numbers from the words it collects, and
> so when he asked for 'year AND 2000' it didnt find anything - but that came
> as a bit of a surprise to the user. Htsearch can also be prevented from
> finding words which do really exist by including a word from the bad_words
> list in the search, for example 'free will' (assuming a database of
> philosphical
> texts.
>
> As a suggestion for a future enhancement it would be good if htsearch could
> identify that it was being asked to search for a noise word and either
> silently discard it, or better, tell the user 'ignored search for 2000'.
>
> John Lines
>
> ----------------------------------------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-request@sdsu.edu containing the single word "unsubscribe" in
> the body of the message.

-- 
--jesse
---------------------------------------------------------------------
J. op den Brouw                            Johanna Westerdijkplein 75
Haagse Hogeschool                                   2521 EN  DEN HAAG
Technology Division                                       Netherlands
Department of Electrical Engineering                   +31 70 4458936
--------------------- jesse@crytonII.st.hhs.nl ----------------------
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:16 PST