RE: [htdig] 2 questions:   and bad_words


Subject: RE: [htdig] 2 questions:   and bad_words
From: NEPOTE Charles (Neuilly Gestion) (charles.nepote@cetelem.fr)
Date: Thu May 11 2000 - 06:32:59 PDT


I have the same problem using a french locale (fr_FR), on a Linux Mandrake
7.0 box.
As a newbie I won't hack the code... I am interested by Gille's solution. Is
it possible to simply remap ascii char 160 to ascii char 20. What are the
files to modify ? How ?

Is there a problem to change next ht://Dig version to let the parser convert
&nbsp to a space ?
Is it long and/or difficult ?

Charles Népote
Paris, France.

> -----Message d'origine-----
> De : Gilles Detillieux [mailto:grdetil@scrc.umanitoba.ca]
> Envoyé : mercredi 1 mars 2000 16:30
> À : ghutchis@wso.williams.edu
> Cc : peterk@wfw.wtb.tue.nl; htdig@htdig.org
> Objet : Re: [htdig] 2 questions:   and bad_words
>
>
> According to Geoff Hutchison:
> > At 12:07 PM +0100 3/1/00, Peter Kruijt wrote:
> > >initials are on one line and the last name on the next, I
> put   's
> > >in between. ht://Dig does not interpret these as spaces.
> Is there a way
> > >to make ht://Dig interpres hard spaxes as ordinary spaces?
> >
> > Yes, it won't treat these as spaces because they're not really
> > spaces. But, of course, they are white-space! To make the
> change you
> > want, edit htdig/SGMLEntities.cc and change the nbsp line to have a
> > space on the right-hand side. The downside is that since this
> > involves how the indexer creates the database, you'll need
> to reindex
> > after making the change.
>
> Hacking the code may not be necessary. Right now,   is mapped
> to 160 - the ISO-8859-1 non-breaking space character. A properly
> defined locale should treat this as space, so if you set your locale
> to something that uses that character set, rather than just US-ASCII,
> and locales work properly on your system, then the HTML parser should
> treat   as space. On the other hand, in the default C locale,
> 160 would be a non-space control character, which I believe the parser
> would treat as punctuation.
>

[...]

> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW:
> http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930



This archive was generated by hypermail 2b28 : Thu May 11 2000 - 04:22:02 PDT