Subject: Re: [htdig] spell check - python wrapper script
From: Gilles Detillieux (
Date: Mon Oct 30 2000 - 14:32:14 PST

According to Geoff Hutchison:
> At 12:44 PM -0400 10/27/00, wrote:
> >In case anyone might find this useful, I have attached a python wrapper
> >script that uses ispell to suggest alternatives to search words that may be
> >typos.
> This has been one of my thoughts for an excellent
> language-independent Fuzzy class for ht://Dig. Of course as the
> "dictionary," you'd actually use the wordlist itself. This would have
> the dual advantages that you'd have any words not in normal
> dictionaries and the algorithm could also offer up words misspelled
> in the pages themselves. (Perish the thouhgt! [sic])
> Of course this idea also got lost in the shuffle. Anyone interested
> in working on this sort of thing (as you have in a sense) would be
> doing us all a great favor.
> Thanks for the script!

Yes, neat script! Adapting for Unix is pretty simple. In addition to
the paths, which are pretty obvious, you should use os.popen() instead
of win32pipe.popen(). A slight bug is that for some words, ispell
can suggest two words separated by a space, which the script doesn't
change to a "+" in the query string. That's a simple addition.
I also needed to define my own replace() function, as my version 1.4
python didn't include replace in its string library.

You can implement Geoff's suggestion of building the ispell dictionary
for this script from the wordlist by adding these lines to rundig:

sed -n 's/^\([a-z][a-z]*\) .*/\1/p' db.wordlist | munchlist > wrdlst.0
buildhash -s wrdlst.0 $COMMONDIR/english.aff wrdlst.hash
rm wrdlst.0*

(N.B.: That's a tab character before the .* in sed's regular expression.)

It would be really great if the "speling" fuzzy match algorithm were
setup to use ispell the way the script does to get the alternate words.
Of course in 3.2, you don't have a db.wordlist file, so you'd need a
"htfuzzy speling" command to traverse the word database and feed it
to munchlist, then to buildhash.

