Re: [htdig] Problem with german umlauts


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 2 Jun 1999 08:40:17 -0500 (CDT)


According to Norbert Hartl:
> Yesterday I discovered a strange problem. I am indexing german pages
> with htdig. After configuring the locale: de_DE.ISO-8859-1 into
> htdig.conf and using a german endings db everything works fine.
> In the search form I can use all of the german umlauts and htdig
> finds the documents for it.
> This works for the search form but not for the $(PAGELIST). When I am
> typing an umlaut into a form it will be converted to %E4 (for ) in order
> to pass it via URL.
> In the PAGELIST there a URLs with an unconverted umlaut. This is leading
> to a misbehaviour by the Mac Netscape. Using the URLs with the un-
> converted umlaut there are no search results for this browser and a
> scrumbled umlaut in the following search form.
> Netscape on Linux and Windows are working with it (the versions I
> have for testing).
> Is this misinterpreted by the Macintosh? Any ideas?
> Is there are workaround for converting this entities for the URLs in
> the PAGELIST?

We uncovered a bug back on May 20, in the encodeURL() function. This
function should encode all non-ascii characters, but right now it doesn't.
Here's the fix:

--- htlib/URLTrans.cc.orig Tue Feb 16 23:03:56 1999
+++ htlib/URLTrans.cc Wed Jun 2 08:29:05 1999
@@ -75,7 +75,7 @@ void encodeURL(String &str, char *valid)
 
     for (p = str; p && *p; p++)
     {
- if (isdigit(*p) || isalpha(*p) || strchr(valid, *p))
+ if (isascii(*p) && (isdigit(*p) || isalpha(*p) || strchr(valid, *p)))
             temp << *p;
         else
         {

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Jun 02 1999 - 06:04:25 PDT