Re: [htdig] SORT and locale

Subject: Re: [htdig] SORT and locale
From: Gilles Detillieux (
Date: Wed May 17 2000 - 14:46:22 PDT

According to "NEPOTE Charles (Neuilly Gestion)":
> I understand that ht://Dig is based on the Gnu sort command.

Not necessarily just GNU's sort. Any sort program that treats all characters
as unique, and sorts based on the character's binary encoding, should do the
job. This is the default behaviour of most UNIX sort commands.

> Is it possible to change the "sort" call of htmerge to get the result as
> above ?

It would be possible to modify htmerge/ to add the -k option
to the sort program, but that may break things if indeed the ID order
matters. It will also break things for systems whose sort program
doesn't support the -k option. Also, as this isn't an issue in 3.2,
I think we should stick to the simplest and most effective workaround
for the time being. If setting the LC_ALL environment variable does
the trick, that's probably the best solution.

> A question remain. As I understand, not only me, but all users of ht://Dig
> who index accented characters should have this problem...
> Can someone confirm ?

What version of textutils does Mandrake 7 use? So far, I haven't seen
this behaviour in any sort program other than textutils-2.0, but it
would be interesting to hear from others. Is this new accent folding
behaviour mandated by POSIX as the default action by the sort program,
or was it a design decision made solely by the maintainers of textutils?
If it's the former, I'd expect this problem would affect more and more
ht://Dig 3.1.x users. If it's the latter, it will be interesting to
see if they revisit that decision in the next release. I imagine this
could potentially break a lot of programs and scripts that assume sort
to work as it always has by default.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

