RE: [htdig] 2 questions:   and bad_words

Subject: RE: [htdig] 2 questions:   and bad_words
From: NEPOTE Charles (Neuilly Gestion) (
Date: Mon May 15 2000 - 05:06:49 PDT

According to Gilles Detilleux:

> According to "NEPOTE Charles (Neuilly Gestion)":
> > I have the same problem using a french locale (fr_FR), on a Linux
> > Mandrake 7.0 box.
> > As a newbie I won't hack the code... I am interested by Gille's
> > solution. Is
> > it possible to simply remap ascii char 160 to ascii char
> 20. What are
> > the files to modify ? How ?
> >
> > Is there a problem to change next ht://Dig version to let the parser
> > convert &nbsp to a space ?
> > Is it long and/or difficult ?
> My solution was to set the locale, but apparently that didn't do the
> trick on your system. I'm really not sure why. Geoff's solution
> was to patch the source. It's a trivial fix: just change the 160 on
> htdig/ line 34 to a 32 (20 is the hexadecimal value of
> a space, not decimal), and recompile, reinstall htdig, and reindex.

(I tell you a secret : I installed via a RPM file ;-)
Ok. May be I will try : it will be my first time changing a source code (I'm
a bit afraid)...

> The change is a bit different in version 3.2, as the SGML decoding has
> changed, but it should be simple there too. I don't think we want to
> make this a permanent change in the distributed source,
> though, because
> it may have some undesirable consequences for some users. Of course,
> it's open for discussion.

Let's open the discussion.
Questions :
-- What sort of undesirable consequences can we have ?
-- Is there a case where the &nbsp has a lexicographic sence ?
-- Is it possible to have the choice to remap &nbsp (like having a new
attribute in htdig.conf (yes, I know, another one...)) ?

Many thanks,
Charles NÚpote
Paris, France.

> --
> Gilles R. Detillieux E-mail: <>
> Spinal Cord Research Centre WWW:
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930

This archive was generated by hypermail 2b28 : Mon May 15 2000 - 02:55:57 PDT