Discussion:
Unicode locale for build environments
(too old to reply)
Benno Fünfstück
2017-06-25 15:57:38 UTC
Permalink
Hello list,

right now, the stdenv appears to not set any locale. I think this means
that the locale defaults to C, which specifies ASCII as the character
encoding. For example, python then defaults to `ASCII` so it will fail if
any script tries to open a file with non-ascii characters:

$ nix-shell --pure -p python36 --command 'python -c "import locale;
print(locale.getpreferredencoding())"'
ANSI_X3.4-1968

Just recently, I've hit a build that failed due to that:

Traceback (most recent call last):
File "nix_run_setup.py", line 8, in <module>
exec(compile(getattr(tokenize, 'open',
open)(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))
File "setup.py", line 20, in <module>
long_description=open('README.rst').read(),
File
"/nix/store/i5ixvcy4i6jqzlzy9aajdhf3wliixvh1-python3-3.6.1/lib/python3.6/encodings/ascii.py",
line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 542:
ordinal not in range(128)

As UTF-8 is the nowadays almost always used (I have yet to see a source
archive that does not use UTF-8), I propose that we make the stdenv support
UTF-8 by default. Would this be a feasible approach? (whether to use
C.UTF-8 or some other UTF-8 locale like en_US.UTF-8 still needs to be
decided)

Regards,
Benno
Freddy Rietdijk
2017-06-25 16:04:38 UTC
Permalink
Earlier discussion on the issue tracker about glibcLocales and C.UTF-8.
https://github.com/NixOS/nixpkgs/issues/20192

For Python 3.x I'm of the opinion we could add a minimal glibcLocales that
provides en_US.UTF-8 and sets LC_ALL in `buildPythonPackage`. This is only
for build-time, not run-time.

On Sun, Jun 25, 2017 at 5:57 PM, Benno FÃŒnfstÃŒck <
Post by Benno Fünfstück
Hello list,
right now, the stdenv appears to not set any locale. I think this means
that the locale defaults to C, which specifies ASCII as the character
encoding. For example, python then defaults to `ASCII` so it will fail if
$ nix-shell --pure -p python36 --command 'python -c "import locale;
print(locale.getpreferredencoding())"'
ANSI_X3.4-1968
File "nix_run_setup.py", line 8, in <module>
exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\\r\\n',
'\\n'), __file__, 'exec'))
File "setup.py", line 20, in <module>
long_description=open('README.rst').read(),
File "/nix/store/i5ixvcy4i6jqzlzy9aajdhf3wliixv
h1-python3-3.6.1/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
ordinal not in range(128)
As UTF-8 is the nowadays almost always used (I have yet to see a source
archive that does not use UTF-8), I propose that we make the stdenv support
UTF-8 by default. Would this be a feasible approach? (whether to use
C.UTF-8 or some other UTF-8 locale like en_US.UTF-8 still needs to be
decided)
Regards,
Benno
_______________________________________________
nix-dev mailing list
https://mailman.science.uu.nl/mailman/listinfo/nix-dev
Loading...