htdig: Bad words


Marjolein Katsma (webmaster@javawoman.com)
Tue, 29 Dec 1998 11:41:35 +0100


I had a look at the bad_word file that came with htdig. It's very small, so
many very common words would still be indexed.

I've created a much larger list - partly based on the standard "stop words"
from SWISH-E but edited and extended. This takes into account how htdig
treats apostrophes by default.

I'm using this basic list to create site-specific lists with extra words
that occur on practically every page in a site (such as my name ;-)).

If anyone is interested in the basic list, which now contains 348 "words",
I can zip it up and post it on the web somewhere. No private emails,
please, just post to the list and I'll post the URL to the list.

Marjolein Katsma webmaster@javawoman.com
Java Woman - http://javawoman.com/
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:57 PST