Re: [htdig] 2 questions:   and bad_words


Subject: Re: [htdig] 2 questions:   and bad_words
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon May 15 2000 - 13:08:53 PDT


According to "NEPOTE Charles (Neuilly Gestion)":
> According to Gilles Detilleux:
>
> > According to "NEPOTE Charles (Neuilly Gestion)":
> > > I have the same problem using a french locale (fr_FR), on a Linux
> > > Mandrake 7.0 box.
> > > As a newbie I won't hack the code... I am interested by Gille's
> > > solution. Is
> > > it possible to simply remap ascii char 160 to ascii char
> > 20. What are
> > > the files to modify ? How ?
> > >
> > > Is there a problem to change next ht://Dig version to let the parser
> > > convert &nbsp to a space ?
> > > Is it long and/or difficult ?
> >
> > My solution was to set the locale, but apparently that didn't do the
> > trick on your system. I'm really not sure why. Geoff's solution
> > was to patch the source. It's a trivial fix: just change the 160 on
> > htdig/SGMLEntities.cc line 34 to a 32 (20 is the hexadecimal value of
> > a space, not decimal), and recompile, reinstall htdig, and reindex.
>
>
> (I tell you a secret : I installed via a RPM file ;-)

That may be your problem right there! If you installed htdig from
htdig-3.1.5-0.i386.rpm, it was built on a Red Hat 4.2 system with libc5,
which doesn't properly support locales. Please provide more details
about your system (distribution name and version, cpu type) and which
RPM you installed. Your other messaage seemed to indicate that locale
support was working, so I'm puzzled by the apparent discrepancy.

> Ok. May be I will try : it will be my first time changing a source code (I'm
> a bit afraid)...

You may find it easier to install the src.rpm, and use the rpm command
to build the source. That way, it's easier to replace one package
with another. Of course, you will have to develop a patch file for the
change to htdig/SGMLEntities.cc, and add it to the spec file.

> > The change is a bit different in version 3.2, as the SGML decoding has
> > changed, but it should be simple there too. I don't think we want to
> > make this a permanent change in the distributed source,
> > though, because
> > it may have some undesirable consequences for some users. Of course,
> > it's open for discussion.
>
>
> Let's open the discussion.
> Questions :
> -- What sort of undesirable consequences can we have ?

I don't know, but offhand the only thing I can think of is some users
might prefer non-breaking spaces to remain non-breaking in the excerpts
displayed in search results. It's probably not that big a deal, but we
have been burned before when a seemingly innocuous change causes a lot
of people to complain.

> -- Is there a case where the &nbsp has a lexicographic sence ?
> -- Is it possible to have the choice to remap &nbsp (like having a new
> attribute in htdig.conf (yes, I know, another one...)) ?

It's certainly possible. The real question is whether this is desirable.
The package is already suffering somewhat from feature bloat - the whole
range of configuration attributes is very confusing to new users - so the
decision to add another option must take that "cost" into consideration.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon May 15 2000 - 10:56:46 PDT