[htdig] No more weird endings problem


Subject: [htdig] No more weird endings problem
From: Alexey Rodriguez (alexey@dicyt.umss.edu.bo)
Date: Fri Jun 09 2000 - 03:11:28 PDT


        Good morning everyone, i finally managed to get some time to look
at my problem. I discovered that htfuzzy has a small bug while parsing
*.aff files. If you have the following rule:

        Z > -Z, CES # audaz audaces
                                    ^
                                    .
                                    .
                                    |
                                    htfuzzy will stop parsing the line
after this space, therefore it will cut the word ending but it won't add
the later part. It caused a lot of repetitions for generated words.
        I fixed the problem with a lazy script that removed spaces after
the comma. Even the "DB2 problem..." messages stopped appearing.
        Maybe this is an issue that has been already addressed. IMHO it
not a good idea to (only) strip the spaces off the aff file, it would be
better to fix the parsing code in EndingsDB.cc so that people with similar
aff files won't have that problem. I can make the patch if you consider
thatnecessary (Gilles? Geoff?).
        Another issue that i encountered is that mungeWord doesn't handle
accented words ('abaco -> ábaco). Is this normal or must i fix the aff
file (or source for instance) ?
        Thanks for reading.
                                                Alexey

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Jun 09 2000 - 04:51:29 PDT