Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected --target behavior with/without --upgrade #8799

Open
di opened this issue Aug 24, 2020 · 14 comments
Open

Unexpected --target behavior with/without --upgrade #8799

di opened this issue Aug 24, 2020 · 14 comments
Labels
C: target pip install's --target option's behaviour handling state: awaiting PR Feature discussed, PR is needed type: bug A confirmed bug or unintended behavior

Comments

@di
Copy link
Member

di commented Aug 24, 2020

Environment

  • pip version:
$ docker run -it python pip --version
pip 20.2.2 from /usr/local/lib/python3.8/site-packages/pip (python 3.8)
  • Python version:
$ docker run -it python python --version
Python 3.8.5
  • OS:
$ docker run -it python uname -a
Linux 098fbbd81e20 4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020 x86_64 GNU/Linux

Description
In a fresh Docker container, I'm seeing a difference in the files installed between pip install X --target /tmp and pip install X --target /tmp --upgrade. My understanding of --target is that the --upgrade flag will only produce different behavior if there's already an existing installation present.

I can't determine if this due to a badly formed distribution of the project in question, a bug in pip, wheel, or a combination of them all.

(You can ignore --no-deps below, I just included it to keep the output smaller).

Behavior without --upgrade

$ docker run -it python bash
root@cdd4f75b6a82:/# ls /tmp

root@b899188462d1:/# pip install -t /tmp --no-deps teradatasql
Collecting teradatasql
  Downloading teradatasql-17.0.0.4-py3-none-any.whl (16.4 MB)
     |████████████████████████████████| 16.4 MB 5.2 MB/s
Installing collected packages: teradatasql
Successfully installed teradatasql-17.0.0.4
WARNING: Target directory /tmp/teradatasql already exists. Specify --upgrade to force replacement.

root@b899188462d1:/# tree /tmp
/tmp
├── teradatasql
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-38.pyc
│   │   └── vernumber.cpython-38.pyc
│   ├── teradatasql.dll
│   ├── teradatasql.dylib
│   ├── teradatasql.so
│   └── vernumber.py
└── teradatasql-17.0.0.4.dist-info
    ├── INSTALLER
    ├── METADATA
    ├── RECORD
    ├── REQUESTED
    ├── WHEEL
    └── top_level.txt

3 directories, 13 files

Behavior with --upgrade

$ docker run -it python bash
root@cdd4f75b6a82:/# ls /tmp

root@6c6a7d9f84a7:/# pip install -t /tmp --no-deps --upgrade teradatasql
Collecting teradatasql
  Downloading teradatasql-17.0.0.4-py3-none-any.whl (16.4 MB)
     |████████████████████████████████| 16.4 MB 7.3 kB/s
Installing collected packages: teradatasql
Successfully installed teradatasql-17.0.0.4

root@6c6a7d9f84a7:/# tree /tmp
/tmp
├── teradatasql
│   ├── LICENSE
│   ├── README.md
│   ├── THIRDPARTYLICENSE
│   └── samples
│       ├── BatchInsPerf.py
│       ├── BatchInsert.py
│       ├── CharPadding.py
│       ├── CommitRollback.py
│       ├── DriverDatabaseVersion.py
│       ├── ElicitFile.py
│       ├── FakeResultSetCon.py
│       ├── FakeResultSetEsc.py
│       ├── FastLoadBatch.py
│       ├── HelpSession.py
│       ├── IgnoreErrors.py
│       ├── InsertXML.py
│       ├── LoadCSVFile.py
│       ├── MetadataFromPrepare.py
│       ├── ParamDataTypes.py
│       ├── StoredProc.py
│       ├── TJEncryptPassword.py
│       ├── __pycache__
│       │   ├── BatchInsPerf.cpython-38.pyc
│       │   ├── BatchInsert.cpython-38.pyc
│       │   ├── CharPadding.cpython-38.pyc
│       │   ├── CommitRollback.cpython-38.pyc
│       │   ├── DriverDatabaseVersion.cpython-38.pyc
│       │   ├── ElicitFile.cpython-38.pyc
│       │   ├── FakeResultSetCon.cpython-38.pyc
│       │   ├── FakeResultSetEsc.cpython-38.pyc
│       │   ├── FastLoadBatch.cpython-38.pyc
│       │   ├── HelpSession.cpython-38.pyc
│       │   ├── IgnoreErrors.cpython-38.pyc
│       │   ├── InsertXML.cpython-38.pyc
│       │   ├── LoadCSVFile.cpython-38.pyc
│       │   ├── MetadataFromPrepare.cpython-38.pyc
│       │   ├── ParamDataTypes.cpython-38.pyc
│       │   ├── StoredProc.cpython-38.pyc
│       │   └── TJEncryptPassword.cpython-38.pyc
│       ├── airports.csv
│       └── udfinc.c
└── teradatasql-17.0.0.4.dist-info
    ├── INSTALLER
    ├── METADATA
    ├── RECORD
    ├── REQUESTED
    ├── WHEEL
    └── top_level.txt

4 directories, 45 files

Expected behavior
The same files are installed regardless of whether --upgrade is specified.

@uranusjr
Copy link
Member

uranusjr commented Aug 25, 2020

The wheel I got from PyPI contains three directories:

  • teradatasql-17.0.0.4.dist-info: it seems that this one is handled correctly.
  • teradatasql: the actual package as in the non-upgrade result.
  • teradatasql-17.0.0.4.data: contains the teradatasql directory that’s included in the --upgrade result.

So it seems like neither of the commands is doing the right thing (the right result should be a combination of the two, IIUC), but they err differently and cuase the discrepency.

@uranusjr uranusjr added C: target pip install's --target option's behaviour handling type: bug A confirmed bug or unintended behavior labels Aug 25, 2020
@di
Copy link
Member Author

di commented Aug 25, 2020

Compare with sampleproject:

$ pip download sampleproject
Collecting sampleproject
  Using cached sampleproject-2.0.0-py3-none-any.whl (4.2 kB)
  Saved ./sampleproject-2.0.0-py3-none-any.whl
Collecting peppercorn
  Using cached peppercorn-0.6-py3-none-any.whl (4.8 kB)
  Saved ./peppercorn-0.6-py3-none-any.whl
Successfully downloaded sampleproject peppercorn

$ unzip sampleproject-2.0.0-py3-none-any.whl
Archive:  sampleproject-2.0.0-py3-none-any.whl
  inflating: sample/__init__.py
  inflating: sample/package_data.dat
  inflating: sample/simple.py
  inflating: sampleproject-2.0.0.data/data/my_data/data_file
  inflating: sampleproject-2.0.0.dist-info/LICENSE.txt
  inflating: sampleproject-2.0.0.dist-info/METADATA
  inflating: sampleproject-2.0.0.dist-info/WHEEL
  inflating: sampleproject-2.0.0.dist-info/entry_points.txt
  inflating: sampleproject-2.0.0.dist-info/top_level.txt
  inflating: sampleproject-2.0.0.dist-info/RECORD
root@148983654c9c:/# pip install -t /tmp --no-deps sampleproject
Collecting sampleproject
  Downloading sampleproject-2.0.0-py3-none-any.whl (4.2 kB)
Installing collected packages: sampleproject
Successfully installed sampleproject-2.0.0

root@148983654c9c:/# tree /tmp/
/tmp/
├── bin
│   └── sample
├── my_data
│   └── data_file
├── sample
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-38.pyc
│   │   └── simple.cpython-38.pyc
│   ├── package_data.dat
│   └── simple.py
└── sampleproject-2.0.0.dist-info
    ├── INSTALLER
    ├── LICENSE.txt
    ├── METADATA
    ├── RECORD
    ├── REQUESTED
    ├── WHEEL
    ├── entry_points.txt
    └── top_level.txt

5 directories, 15 files
root@056b41cc6407:/# pip install -t /tmp --no-deps --upgrade sampleproject
Collecting sampleproject
  Downloading sampleproject-2.0.0-py3-none-any.whl (4.2 kB)
Installing collected packages: sampleproject
Successfully installed sampleproject-2.0.0

root@056b41cc6407:/# tree /tmp/
/tmp/
├── bin
│   └── sample
├── my_data
│   └── data_file
├── sample
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── __init__.cpython-38.pyc
│   │   └── simple.cpython-38.pyc
│   ├── package_data.dat
│   └── simple.py
└── sampleproject-2.0.0.dist-info
    ├── INSTALLER
    ├── LICENSE.txt
    ├── METADATA
    ├── RECORD
    ├── REQUESTED
    ├── WHEEL
    ├── entry_points.txt
    └── top_level.txt

5 directories, 15 files

@uranusjr
Copy link
Member

My guess is that the file-copying logic assumes .data does not write to the package directory (because… why not just include those files in the package directory in the first place?) and falters because the teradatasql wheel wants to write to the same site-packages/teradatasql for both installation phases.

@di
Copy link
Member Author

di commented Aug 25, 2020

That makes sense, and explains why it produces the following warning even on a fresh install:

WARNING: Target directory /tmp/teradatasql already exists. Specify --upgrade to force replacement.

So is this a bug in wheel, pip or elsewhere? Given that there are probably more than one package like this in the wild, what could pip do to handle them gracefully?

@uranusjr
Copy link
Member

Probably in pip, since pip implements wheel installation internally. I think instead of letting the directories overwrite each other, pip should merge them and emit warnings when there are duplicate files.

/cc @dholth

copybara-service bot pushed a commit to GoogleCloudPlatform/buildpacks that referenced this issue Aug 25, 2020
…buildpack

Some dependencies are incompatible with the --upgrade and --target flag combination, e.g. pypa/pip#8799.

PiperOrigin-RevId: 328384678
Change-Id: Iea6962774a6bd82f02435ba785ba92445e762c8d
@hexagonrecursion
Copy link
Contributor

$ unzip -l ~/teradatasql-17.0.0.1-py3-none-any.whl 
Archive:  /home/user/teradatasql-17.0.0.1-py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
    39892  03-19-2020 21:45   teradatasql/__init__.py
 10961582  06-04-2020 21:26   teradatasql/teradatasql.dll
 14400924  06-04-2020 21:27   teradatasql/teradatasql.dylib
 14347454  06-04-2020 21:27   teradatasql/teradatasql.so
       95  06-04-2020 21:21   teradatasql/vernumber.py
    13929  07-03-2019 17:09   teradatasql-17.0.0.1.data/data/teradatasql/LICENSE
    96404  06-04-2020 21:20   teradatasql-17.0.0.1.data/data/teradatasql/README.md
     8574  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/THIRDPARTYLICENSE
     1972  11-26-2019 17:39   teradatasql-17.0.0.1.data/data/teradatasql/samples/BatchInsPerf.py
      631  06-07-2019 16:42   teradatasql-17.0.0.1.data/data/teradatasql/samples/BatchInsert.py
     1906  02-28-2019 18:32   teradatasql-17.0.0.1.data/data/teradatasql/samples/CharPadding.py
     1229  07-29-2019 20:17   teradatasql-17.0.0.1.data/data/teradatasql/samples/CommitRollback.py
      531  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/DriverDatabaseVersion.py
      935  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/ElicitFile.py
     2601  10-18-2019 20:43   teradatasql-17.0.0.1.data/data/teradatasql/samples/FakeResultSetCon.py
     2633  10-18-2019 20:43   teradatasql-17.0.0.1.data/data/teradatasql/samples/FakeResultSetEsc.py
     4637  11-26-2019 17:35   teradatasql-17.0.0.1.data/data/teradatasql/samples/FastLoadBatch.py
      481  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/HelpSession.py
     4786  12-11-2019 17:41   teradatasql-17.0.0.1.data/data/teradatasql/samples/IgnoreErrors.py
     1225  03-27-2020 19:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/InsertXML.py
      941  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/LoadCSVFile.py
     1238  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/MetadataFromPrepare.py
     3946  03-27-2020 16:14   teradatasql-17.0.0.1.data/data/teradatasql/samples/ParamDataTypes.py
     5623  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/StoredProc.py
    28958  11-25-2019 18:57   teradatasql-17.0.0.1.data/data/teradatasql/samples/TJEncryptPassword.py
      229  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/airports.csv
      155  02-26-2019 00:51   teradatasql-17.0.0.1.data/data/teradatasql/samples/udfinc.c
       12  06-04-2020 21:27   teradatasql-17.0.0.1.dist-info/top_level.txt
       97  06-04-2020 21:27   teradatasql-17.0.0.1.dist-info/WHEEL
    95969  06-04-2020 21:27   teradatasql-17.0.0.1.dist-info/METADATA
     3462  06-04-2020 21:27   teradatasql-17.0.0.1.dist-info/RECORD
---------                     -------
 40033051                     31 files

teradatasql/ is copied to purelib because Root-Is-Purelib: true:

$ fgrep Root teradatasql-17.0.0.1.dist-info/WHEEL
Root-Is-Purelib: true

teradatasql-17.0.0.1.data/data/teradatasql/ is copied to data per https://www.python.org/dev/peps/pep-0427/

It just so happens that purelb and data are the same directory in this case.

@hexagonrecursion
Copy link
Contributor

hexagonrecursion commented Feb 5, 2021

Data files are a surprisingly confusing aspect of building wheels. Doubly so if you don't want them to be installed in the same directory as your python files. After way too much trial and error I have managed to produce source code that creates a wheel with a layout that triggers the issue:

pyproject.toml

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

setup.cfg

[metadata]
name = mypackage
version = 0.0.1

[options]
packages = mypackage

[options.data_files]
mypackage = data-lives-here/example.txt

data-lives-here/example.txt

Hello

mypackage/__init__.py - empty

Proof

$  unzip -l dist/mypackage-0.0.1-py3-none-any.whl 
Archive:  dist/mypackage-0.0.1-py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  02-05-2021 13:43   mypackage/__init__.py
        6  02-05-2021 13:43   mypackage-0.0.1.data/data/mypackage/example.txt
      173  02-05-2021 13:57   mypackage-0.0.1.dist-info/METADATA
       92  02-05-2021 13:57   mypackage-0.0.1.dist-info/WHEEL
       10  02-05-2021 13:57   mypackage-0.0.1.dist-info/top_level.txt
      481  02-05-2021 13:57   mypackage-0.0.1.dist-info/RECORD
---------                     -------
      762                     6 files

$ pip install --target=$HOME/foo dist/mypackage-0.0.1-py3-none-any.whl
Processing ./dist/mypackage-0.0.1-py3-none-any.whl
Installing collected packages: mypackage
Successfully installed mypackage-0.0.1
WARNING: Target directory /home/user/foo/mypackage already exists. Specify --upgrade to force replacement.
$ (cd ~; tree foo)
foo
├── mypackage
│   ├── __init__.py
│   └── __pycache__
│       └── __init__.cpython-38.pyc
└── mypackage-0.0.1.dist-info
    ├── INSTALLER
    ├── METADATA
    ├── RECORD
    ├── WHEEL
    └── top_level.txt

3 directories, 7 files
$ pip install --target=$HOME/bar --upgrade dist/mypackage-0.0.1-py3-none-any.whl
Processing ./dist/mypackage-0.0.1-py3-none-any.whl
Installing collected packages: mypackage
Successfully installed mypackage-0.0.1
$ (cd ~; tree bar)
bar
├── mypackage
│   └── example.txt
└── mypackage-0.0.1.dist-info
    ├── INSTALLER
    ├── METADATA
    ├── RECORD
    ├── WHEEL
    └── top_level.txt

2 directories, 6 files
$ cat -v ~/foo/mypackage-0.0.1.dist-info/RECORD 
../../mypackage/example.txt,sha256=ZqBFtFIQLFnYQOwJfVnZRn4To_NPZJTlOf_TLBuzXxg,6^M
mypackage-0.0.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4^M
mypackage-0.0.1.dist-info/METADATA,sha256=rYbPAdwCQljoeH8303eP07O1vJ3sYqrJxzIECEcGiBk,173^M
mypackage-0.0.1.dist-info/RECORD,,^M
mypackage-0.0.1.dist-info/WHEEL,sha256=OqRkF0eY5GHssMorFjlbTIq072vpHpF60fIQA6lS9xA,92^M
mypackage-0.0.1.dist-info/top_level.txt,sha256=ax2Dg5ulHxjedaMk72kLxMHSvJB80I9iLPYgNI4IQeA,10^M
mypackage/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0^M
mypackage/__pycache__/__init__.cpython-38.pyc,,^M
$ cat -v ~/bar/mypackage-0.0.1.dist-info/RECORD 
../../mypackage/example.txt,sha256=ZqBFtFIQLFnYQOwJfVnZRn4To_NPZJTlOf_TLBuzXxg,6^M
mypackage-0.0.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4^M
mypackage-0.0.1.dist-info/METADATA,sha256=rYbPAdwCQljoeH8303eP07O1vJ3sYqrJxzIECEcGiBk,173^M
mypackage-0.0.1.dist-info/RECORD,,^M
mypackage-0.0.1.dist-info/WHEEL,sha256=OqRkF0eY5GHssMorFjlbTIq072vpHpF60fIQA6lS9xA,92^M
mypackage-0.0.1.dist-info/top_level.txt,sha256=ax2Dg5ulHxjedaMk72kLxMHSvJB80I9iLPYgNI4IQeA,10^M
mypackage/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0^M
mypackage/__pycache__/__init__.cpython-38.pyc,,^M

@hexagonrecursion
Copy link
Contributor

Is it intentional that purelib and data are the same directory when using pip install --target?

@pfmoore
Copy link
Member

pfmoore commented Feb 5, 2021

I don't think it's so much "intentional" as that there's no obvious "better" option. Where would you expect data to go in an install where the only thing you've specified is (in effect) the location of the site-packages directory?

--target installs are an odd case precisely because they don't specify a complete "installation scheme" of directories, so we have to make arbitrary guesses for the additional locations.

@hexagonrecursion
Copy link
Contributor

The culprit:

for lib_dir in lib_dir_list:
for item in os.listdir(lib_dir):
if lib_dir == data_dir:
ddir = os.path.join(data_dir, item)
if any(s.startswith(ddir) for s in lib_dir_list[:-1]):
continue
target_item_dir = os.path.join(target_dir, item)
if os.path.exists(target_item_dir):
if not upgrade:
logger.warning(
'Target directory %s already exists. Specify '
'--upgrade to force replacement.',
target_item_dir
)
continue
if os.path.islink(target_item_dir):
logger.warning(
'Target directory %s already exists and is '
'a link. pip will not automatically replace '
'links, please remove if replacement is '
'desired.',
target_item_dir
)
continue
if os.path.isdir(target_item_dir):
shutil.rmtree(target_item_dir)
else:
os.remove(target_item_dir)
shutil.move(
os.path.join(lib_dir, item),
target_item_dir
)

@hexagonrecursion
Copy link
Contributor

hexagonrecursion commented Feb 5, 2021

btw:

cat -v ~/bar/mypackage-0.0.1.dist-info/RECORD 
../../mypackage/example.txt,sha256=ZqBFtFIQLFnYQOwJfVnZRn4To_NPZJTlOf_TLBuzXxg,6^M

This is wrong. It should be:

cat -v ~/bar/mypackage-0.0.1.dist-info/RECORD 
mypackage/example.txt,sha256=ZqBFtFIQLFnYQOwJfVnZRn4To_NPZJTlOf_TLBuzXxg,6^M

I'll open a new issue for that. Found it #7658

@hexagonrecursion
Copy link
Contributor

Possibly duplicate: #7548

@uranusjr
Copy link
Member

uranusjr commented Feb 6, 2021

Yup, that issue seems to be the same as this. Since this issue contains much more detailed information and discussion on how this should be fixed, I’ll close #7548 in favour of this one.

@uranusjr
Copy link
Member

uranusjr commented Feb 6, 2021

Also adding awaiting PR since we’ve more or less concluded how this should be resolved, and “only” need someone to have the time to actually implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: target pip install's --target option's behaviour handling state: awaiting PR Feature discussed, PR is needed type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants