Re: [htdig] Not indexing a word

Subject: Re: [htdig] Not indexing a word
Date: Wed Feb 23 2000 - 09:38:43 PST

At 8:17 AM -0600 2/23/2000, Geoff Hutchison wrote:
>At 12:31 PM +0000 2/23/00, Malcolm Austen wrote:
>>+ I'm afraid you can't say "index this word," though that's not a bad
>>+ idea (a "good words" list?)
>>OK, let's ask for the sky ... how about (at some far distant point) a
>>"good phrases" list please?
>>The context of this request is that I don't want to index all instances of
>>"it" but I would like to index "IT" in the context of "IT Committee" 8-)
>Yes, this would be nice, wouldn't it. Adding a "good words" list
>isn't so bad--you check it quickly before tossing the word. The
>difficulty of your request is that it would change the way documents
>are parsed--right now they're split up into words, so you'd have to
>say "wait, we just saw 'committee,' did we have 'IT' just then?" You
>could still do it, but it would be a bit more complex.

Most of the large search engines I've seen no longer ignore short
words and stopwords -- they just index everything. I realize it
requires a lot more disk space (though there may be some clever ways
around that), but it simplifies things both internally and for the
end-users. That way, they can search for "To Be Or Not To Be" and
find something!

My rule for search engines is "no surprises", and I think there are
enough legitimate instances of people needing to search two and even
one-letter words that ht://Dig should allow that as an option.


