Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn before overwriting files owned by previously installed wheels #8119

Closed
wants to merge 3 commits into from

Conversation

dhellmann
Copy link

Before installing a wheel, look for other distributions that have also
written files into the same top level directory. Warn if any of the
same files will be modified by the new wheel.

Addresses #4625

@dhellmann
Copy link
Author

The Windows tests are failing because they're seeing a deprecation warning for python 2.7 instead of the new warning about the conflicting file. Should that test be marked python 3 only?

The Travis job seems to have failed with an error for PyPI.org. I can push a trivial change to rerun those, but maybe if someone has time to give meaningful feedback there's some other real change to be made so I'll hold off.

@uranusjr
Copy link
Member

The Windows tests are failing because they're seeing a deprecation warning for python 2.7 instead of the new warning about the conflicting file.

They should see both the deprecation warning and the conflicting file. This likely indicates a bug in the implementation.

@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from 5d66c75 to bf7b22b Compare April 24, 2020 12:57
@dhellmann
Copy link
Author

The Windows tests are failing because they're seeing a deprecation warning for python 2.7 instead of the new warning about the conflicting file.

They should see both the deprecation warning and the conflicting file. This likely indicates a bug in the implementation.

Yes, you’re right. That’s fixed now.

@pradyunsg pradyunsg added state: needs discussion This needs some more discussion state: needs eyes Needs a maintainer/triager to take a closer look type: enhancement Improvements to functionality labels Apr 25, 2020
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 20, 2020
@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from bf7b22b to 7a2abf4 Compare May 20, 2020 16:29
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 20, 2020
@dhellmann
Copy link
Author

rebased

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, more about communication around the change rather than the change itself.

src/pip/_internal/operations/install/wheel.py Outdated Show resolved Hide resolved
news/4625.bugfix Outdated Show resolved Hide resolved
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label May 22, 2020
@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from 7a2abf4 to f45a974 Compare May 23, 2020 10:57
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label May 23, 2020
@dhellmann
Copy link
Author

Rebased and comments addressed

@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from f45a974 to c8d819f Compare May 23, 2020 11:07
pradyunsg
pradyunsg previously approved these changes May 23, 2020
@pradyunsg pradyunsg removed state: needs discussion This needs some more discussion state: needs eyes Needs a maintainer/triager to take a closer look labels May 23, 2020
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Jul 1, 2020
@McSinyx
Copy link
Contributor

McSinyx commented Jul 1, 2020

Hi there, I've just noticed the nice oxford_comma_join, what do you think about making it taking values as variadic arguments, @dhellmann?

@dhellmann
Copy link
Author

Hi there, I've just noticed the nice oxford_comma_join, what do you think about making it taking values as variadic arguments, @dhellmann?

I used a sequence argument because I expected the input to come from splitting inputs or filtering some other sequence and have an unknown number of values. The passing a single such value to a function, rather than exploding it to use a variadic calling pattern, seemed like the right fit.

Did you have a different sort of use case in mind?

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks basically good to go -- could you rebase this + (optionally) make a small change to the tests?

tests/unit/test_messages.py Outdated Show resolved Hide resolved
@McSinyx
Copy link
Contributor

McSinyx commented Jul 1, 2020

I used a sequence argument because I expected the input to come from splitting inputs or filtering some other sequence and have an unknown number of values.

That's right, I was thinking about manual call but it doesn't make much sense for me anymore after reading this 😄

@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from c8d819f to 2dae22b Compare July 1, 2020 18:32
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jul 1, 2020
@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch 3 times, most recently from 3396da0 to ace9b21 Compare July 1, 2020 19:52
@dhellmann
Copy link
Author

I get completely different results when I run the linter locally (lots more errors, in unrelated files). Is there some trick to running it beyond using tox -e lint?

@deveshks
Copy link
Contributor

deveshks commented Jul 1, 2020

I get completely different results when I run the linter locally (lots more errors, in unrelated files). Is there some trick to running it beyond using tox -e lint?

I just run that, but my tox runs inside a virtualenv. Could you share what output do you see? I never saw local linter results being different then the linter result on CI.

is reported and that the second wheel does overwrite
the first.
"""
from tests.lib import TestFailure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few other tests in this file importing this inside the test. Maybe it's a good time to move this import along with the others at the top?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just following the local coding patterns in this PR. Changing all of that seems like a task for another PR.

@dhellmann
Copy link
Author

I get completely different results when I run the linter locally (lots more errors, in unrelated files). Is there some trick to running it beyond using tox -e lint?

I just run that, but my tox runs inside a virtualenv. Could you share what output do you see? I never saw local linter results being different then the linter result on CI.

I have tox in a virtualenv, too.

$ tox --version
3.14.6 imported from /Users/dhellmann/Envs/pip/lib/python3.8/site-packages/tox/__init__.py

https://gist.github.com/dhellmann/4bba77827a56e6890f5e3633153f5153

Copy link
Member

@chrahunt chrahunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take a slightly different approach, like:

  1. pre-compute a list of the paths that will be installed for the current wheel, from the files in the source directory and the anticipated generated script files (this would be easier with some refactoring, which could be done in separate PRs ahead-of-time)
  2. if any file already exists, then determine what package owns it using RECORD or installed_files.txt for the installed packages in the environment
  3. report the conflict as a warning, after filtering out __init__.py associated with namespace packages, as mentioned in the original issue

This has several benefits:

  1. 1 will be mostly compatible with installation directly from wheel (see Extract wheels directly to install location #6030)
  2. 1 also covers generated script files
  3. 2 should be compatible with more distributions and have fewer false negatives than looking at top_level.txt (which I believe is non-standard)
  4. doing 2 lazily (when we see a file which will be overwritten) means in the majority of cases we will not need to process every installed package

with open(installing_top_level_path, 'r') as fd:
installing_top_level = fd.read().strip()
files_from_other_owners = _get_file_owners(
lib_dir, installing_top_level, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be necessary to ignore the installing package, since we should have uninstalled the existing distribution before trying to install a new version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually a file conflict will be an error that prevents installation. We want to report the error before removing the existing version and breaking the environment.

Copy link
Member

@chrahunt chrahunt Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have a rollback mechanism that reverts an upgrading package on any installation failure. I think this should have us covered, if I understand the concern you're raising.

dhellmann added 3 commits July 2, 2020 12:04
Before installing a wheel, look for other distributions that have also
written files into the same top level directory. Warn if any of the
same files will be modified by the new wheel.

Addresses pypa#4625

Signed-off-by: Doug Hellmann <[email protected]>
Rather than emitting a separate warning for each owner, produce one
warning with all of the other owner names in the message.

Signed-off-by: Doug Hellmann <[email protected]>
@dhellmann dhellmann force-pushed the warn-on-file-owner-confict branch from 7bb5beb to cca8cfb Compare July 2, 2020 16:04
@dhellmann
Copy link
Author

Can we take a slightly different approach, like:

  1. pre-compute a list of the paths that will be installed for the current wheel, from the files in the source directory and the anticipated generated script files (this would be easier with some refactoring, which could be done in separate CRs ahead-of-time)
  2. if any file already exists, then determine what package owns it using RECORD or installed_files.txt for the installed packages in the environment
  3. report the conflict as a warning, after filtering out __init__.py associated with namespace packages, as mentioned in the original issue

This has several benefits:

  1. 1 will be mostly compatible with installation directly from wheel (see Extract wheels directly to install location #6030)

Can you describe how the approach that has already been implemented is incompatible with installing directly?

  1. 1 also covers generated script files

Yes, that's a shortcoming of this implementation. It's still an incremental improvement.

  1. 2 should be compatible with more distributions and have fewer false negatives than looking at top_level.txt (which I believe is non-standard)

Is it? It's in all of the installed packages I see on my system and it's there in the test suite when the test wheels are installed. What's creating it?

  1. doing 2 lazily (when we see a file which will be overwritten) means in the majority of cases we will not need to process every installed package

If we're going to turn this into a blocking error, we need to examine the state of the system before making changes. To handle the upgrade case, we need to look for conflicts while we may have an old version of the package installed, which would automatically trigger looking for the owner of (effectively) every file that would be installed. So I don't think we would buy much time by doing the ownership evaluation lazily in a lot of cases. The current implementation only looks at distributions that claim to have written files to the same top level directory as the one being installed, so it ignores most of them already.

@pfmoore
Copy link
Member

pfmoore commented Jul 2, 2020

top_level.txt is, I believe, a setuptools thing. It's certainly not a standardised metadata file (of which the only ones that I can recall are as per PEP 376 - RECORD, REQUESTED and INSTALLER)

@dhellmann
Copy link
Author

So I guess to be accurate this would need to read the RECORD file for every installed distribution. That seems like something we want to only do one time, if we can, but wheels are processed individually. Would it be safe to use a module-level global to hold a cache?

@chrahunt
Copy link
Member

chrahunt commented Jul 2, 2020

So I guess to be accurate this would need to read the RECORD file for every installed distribution.

Yes. Since pip itself generates installed_files.txt I think it is also reasonable to support, but that could be done in a separate PR.

Would it be safe to use a module-level global to hold a cache?

Yes, that should be OK! A few things that may help:

  1. see the approach taken here for how we normally control those kinds of globals
  2. we could scope it with a with statement around this block
  3. where we are expecting it in the code we would want to assert _global_file_cache is not None to satisfy the type checker and because there is no point testing the case when it is not set (since we expect it to always be set for an installation)
  4. to provide this in unit tests automatically one approach could be to have an autoused pytest fixture, similar to what we do here. Ideally we would want to keep the number of impacted tests to a minimum, but I don't think it's a big deal to have it global like this one if it impacts a lot of tests.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Jul 11, 2020
@pradyunsg pradyunsg dismissed their stale review July 15, 2020 18:03

There have been other (valid) concerns raised that I agree with. :)

@dhellmann dhellmann closed this Nov 19, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants