utils: decode requirement files according to their BOM if present #3485

xavfernandez · 2016-02-12T22:43:47Z

Work In Progrees, should fix #2865

pfmoore · 2016-02-12T23:14:27Z

LGTM. Remarkably close to what I have in my local checkout at the moment. But cleaner :-)

xavfernandez · 2016-02-13T14:31:39Z

Sorry to see we worked on the same thing :-/
But for some reason Python 2.6 does not like my solution...
Plus my encoding with utf8 and decoding with locale.getpreferredencoding(False) does not seem like a good idea.

xavfernandez · 2016-02-13T19:01:59Z

https://docs.python.org/2/library/shlex.html

Prior to Python 2.7.3, this module did not support Unicode input. 😢

pfmoore · 2016-02-13T21:26:11Z

No problem.

I pray for the day when we can drop 2.6 support :-(
How about, if we're on 2.6 then we encode the requirements file content as ASCII (on the basis that if there's no Unicode support, ASCII is all that is valid) and report any encoding errors as "Unicode requirements files are only supported in Python 2.7 onwards"?

xavfernandez · 2016-02-14T00:28:44Z

Well I tried being more forgiving and encoded to utf8 before shlex.split: tests seem happy.

I'm trying to add some code cleaning (always make get_file_content return text), hoping #1441 won't come back...

Ivoz · 2016-02-14T02:38:26Z

tests/unit/test_utils.py

+        assert auto_decode(data) == "Django==1.4.2"
+
+    def test_auto_decode_no_bom(self):
+        data = b"foobar"


decoding text produces text, not bytes

dstufft · 2016-03-03T20:25:54Z

Is this still a WIP? Or ready to go?

xavfernandez · 2016-03-03T22:42:45Z

Ok, from a simple mypkg example containing only:

from setuptools import setup

setup(
    name='mypkg',
    version=0.1,
    install_requires=['httpretty==0.7.1'],
)

I could reproduce the issue of #1441 with pip 1.5.0.
With this PR, mypkg could be installed without any error so no obvious regression on this issue.

locale.getpreferredencoding(False) is not always utf8

and hope pypa#1441 is behind us

utils: decode requirement files according to their BOM if present

aodag · 2016-03-07T03:15:14Z

Hi, this change causes error during reading requirements.txt encoded utf-8 without BOM.
On python2, locale.getpreferredencoding(False) returns ASCII encoding.
Why not use utf-8 as default encoding?

aodag@debian:~$ uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux
aodag@debian:~$ python
Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'ANSI_X3.4-1968'
>>>
aodag@debian:~$ python3
Python 3.4.2 (default, Oct  8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding(False)
'UTF-8'

xavfernandez · 2016-03-07T09:32:15Z

Why not use utf-8 as default encoding?
Because, it might well be encoded in something else than utf8 :) Previously it worked because in python2 we were operating on str (bytes) types.

It looks like pip should be calling locale.setlocale(locale.LC_ALL, '') before calling locale.getpreferredencoding(False).

Or maybe we could accept this kind of header: # -*- coding: utf-8 -*- in requirements.txt ?

Any thought @pfmoore @dstufft ?

xavfernandez · 2016-03-07T10:10:45Z

I've implemented belt and suspenders in #3547

pfmoore · 2016-03-07T10:19:13Z

For the simple cases, I think pip should treat the requirements file as a text file (i.e. open it in the system default encoding), with BOM detection being a useful convenience for Windows users whose tools have a tendency to use things like UTF-16 with BOM in spite of the default encoding. I'm -0.5 on defaulting to UTF-8, as Windows tools need extra effort to specify UTF-8, so we'll likely just end up with Windows users complaining that pip isn't handling their requirements files properly.

I don't know what incantations are needed with setlocale I'll defer to your expertise there.

For cross-platform requirements files, an encoding header seems like a plausible approach.

(And I see that while I've been writing this response, you've created a PR implementing this. Looks like a good solution to me, go for it (although there's a text/bytes problem I've commented on in the PR).

xavfernandez force-pushed the bom_detection branch from 3b1dde2 to 667593c Compare February 14, 2016 00:30

Ivoz reviewed Feb 14, 2016
View reviewed changes

xavfernandez force-pushed the bom_detection branch 2 times, most recently from 6648558 to 6a87e0c Compare February 18, 2016 10:09

xavfernandez added this to the 8.1 milestone Feb 24, 2016

xavfernandez force-pushed the bom_detection branch from 6a87e0c to dafe020 Compare February 26, 2016 09:02

xavfernandez force-pushed the bom_detection branch from dafe020 to 0b01167 Compare March 3, 2016 22:09

xavfernandez added 5 commits March 4, 2016 10:50

utils: decode requirement files according to their BOM if present

e288926

Fix shlex call for old python versions (<2.7.3)

a4d1ec2

Simplify test

b2ee4f8

locale.getpreferredencoding(False) is not always utf8

Let's try always returning text

9aa5e8f

and hope pypa#1441 is behind us

Add changelog

6cc6f7b

xavfernandez force-pushed the bom_detection branch from 0b01167 to 6cc6f7b Compare March 4, 2016 09:51

xavfernandez added a commit that referenced this pull request Mar 4, 2016

Merge pull request #3485 from xavfernandez/bom_detection

cab10e4

utils: decode requirement files according to their BOM if present

xavfernandez merged commit cab10e4 into pypa:develop Mar 4, 2016

xavfernandez deleted the bom_detection branch March 4, 2016 10:42

This was referenced Mar 4, 2016

Install from requirements file (into a virtualenv) fails on Windows #3479

Closed

Unicode encoding in requirements file #733

Closed

cjerdonek mentioned this pull request Mar 1, 2019

LookupError: unknown encoding: utf16-le #6054

Closed

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils: decode requirement files according to their BOM if present #3485

utils: decode requirement files according to their BOM if present #3485

xavfernandez commented Feb 12, 2016

pfmoore commented Feb 12, 2016

xavfernandez commented Feb 13, 2016

xavfernandez commented Feb 13, 2016

pfmoore commented Feb 13, 2016

xavfernandez commented Feb 14, 2016

Ivoz Feb 14, 2016

dstufft commented Mar 3, 2016

xavfernandez commented Mar 3, 2016

aodag commented Mar 7, 2016

xavfernandez commented Mar 7, 2016

xavfernandez commented Mar 7, 2016

pfmoore commented Mar 7, 2016

utils: decode requirement files according to their BOM if present #3485

utils: decode requirement files according to their BOM if present #3485

Conversation

xavfernandez commented Feb 12, 2016

pfmoore commented Feb 12, 2016

xavfernandez commented Feb 13, 2016

xavfernandez commented Feb 13, 2016

pfmoore commented Feb 13, 2016

xavfernandez commented Feb 14, 2016

Ivoz Feb 14, 2016

Choose a reason for hiding this comment

dstufft commented Mar 3, 2016

xavfernandez commented Mar 3, 2016

aodag commented Mar 7, 2016

xavfernandez commented Mar 7, 2016

xavfernandez commented Mar 7, 2016

pfmoore commented Mar 7, 2016