Re: SV: [htdig] Foreign chars (Swedish)


Subject: Re: SV: [htdig] Foreign chars (Swedish)
From: Philippe Ramkvist-Henry (phira600@student.liu.se)
Date: Tue Nov 30 1999 - 05:36:07 PST


On Mon, 29 Nov 1999, Gilles Detillieux wrote:

> Just a hunch, but you wouldn't happen to have a ä in valid_punctuation,
> would you? In any case, could you run htdig -vvv twice, searching
> first for ANLÄNDE, and then for anlände? How do the initial debugging
> messages differ. What's happening to the ä - is it getting stripped
> out or changed to another character? Is the upper case Ä getting changed
> to a ä, or to another character? Are you using the exact same config
> file for htdig, htmerge and htsearch?

I use the default for "valid_punctuation", I even tried adding it as
"extra_word_characters: ä".

Here's the debugging info for the second (237th! :) try.
 
su10-2 <74> htsearch -vvv
Enter value for words: anlände
tempWords: 'anlände:0 '
Boolean: 'anlände:0 '
initial: ''
Add: anlände
searchWords: 'anlände:0 '
LogicalWords: anlände
Pattern:
Enter value for format:

su10-2 <75> htsearch -vvv
Enter value for words: ANLÄNDE
tempWords: 'anlände:0 '
Boolean: 'anlände:0 '
initial: ''
Fuzzy on: anlände
   (null) anlände
   (null) word=anlände prefix_suffix=* prefix_suffix_length=1
minimum_prefix_length=1

   endings anlända anländandet anländandets anländande anländ- anländer
anlänt anländs anländes anlänts anländes
   synonyms
searchWords: '(:0 anlände:0 |:0 anlända:0 |:0 anländandet:0 |:0
anländandets:0 |:0 anländande:0 |:0 anländ-:0 |:0 anländer:0 |:0 anlänt:0
|:0 anländs:0 |:0 anländes:0 |:0 anlänts:0 |:0 anländes:0 ):0 '
LogicalWords: (anlände or anlända or anländandet or anländandets or
anländande or anländ- or anländer or anlänt or anländs or anländes or
anlänts or anländes)
Pattern: anlände
Enter value for format:

looks ok to me... what do you say?

> Not that I know of, but you could put a originalWords.uppercase(); right
> after the originalWords.chop(" \t\r\n"); in htsearch/htsearch.cc. If the
> htsearch -vvv above doesn't get to the root of the problem, it might be
> interesting to see if this hack has any effect.

I'll try this too. If the above looks ok.

I got a mail from another Swedish subscriber of this list and according to
him everything worked well using sv_SE (which I don't have) and indexing
using an English dictionary (which shouldn't change anything).

I'll try to get hold of that locale and try it...

/Philippe

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 05:52:30 PST