Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement preferring closest conflicting causes for backtracking resolution #12318

Closed
1 task done
notatallshaw opened this issue Oct 5, 2023 · 15 comments
Closed
1 task done
Labels
C: dependency resolution About choosing which dependencies to install resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature

Comments

@notatallshaw
Copy link
Member

notatallshaw commented Oct 5, 2023

What's the problem this feature will solve?

When backtracking Pip attempts to optimize picking which package to backtrack on by first choosing packages which are the "cause" of backtracking, i.e. they have requirements which generate conflicts with other package's requirements.

However the problem is that resolvelib doesn't actually have a very granular understanding of "causes", if two packages conflict on their numpy requirement, then all packages which require numpy, even if they have a very libreral requirement, will be considered a "cause".

This can create an an exponetial choice in package backtracking choice, and a ResolutionTooDeep error, e.g. #12305

Describe the solution you'd like

The "Requirement" API that resolvelib provides does not appear to be enough to apply any filtering logic on resolvelib side.

However Pip understands it's own requirement objects and there could implement a "narrow_causes" "filter unsatisfied names" method from the Pip resolve provider that resolvelib can call, and Pip can choose to closest causes that conflict.

Alternative Solutions

Maybe I am misunderstanding the resolvelib/Pip architecture and there is a much simpler way to implement this. Please provide feedback if you think

Additional context

I have created this branch as a rough test for this approach, there are probably more places in resolvelib to apply this method to speed things or give better quality messages:

On Python 3.8 this command produces a ResolutionTooDeep error on Pip/main but solves quickly on my branch:

python -m pip install --disable-pip-version-check --dry-run --only-binary ":all:" "numpy==1.21.6" "cython==0.29.28" "scipy>=1.4.0" "torch>=1.7" "torchaudio" "soundfile" "librosa==0.10.0.*" "numba==0.55.1" "inflect==5.6.0" "tqdm" "anyascii" "pyyaml" "fsspec>=2021.04.0" "aiohttp" "packaging" "flask" "pysbd" "pandas" "matplotlib" "trainer==0.0.20" "coqpit>=0.0.16" "pypinyin" "mecab-python3==1.0.5" "jamo" "bangla==0.0.2" "k_diffusion" "einops" "transformers"

As this requires an addition to the resolvelib API I am looking for buy in for this approach from Pip maintainers before I start creating PRs across both projects.

Code of Conduct

@uranusjr
Copy link
Member

uranusjr commented Oct 5, 2023

However Pip understands it's own requirement objects and there could implement a "narrow_causes" method from the Pip resolve provider that resolvelib can call

An abstraction would be needed in the provider instead so resolvelib does not depend on an interface on the client side (this is why the provider interface exists). Otherwise the general idea seems reasonable.

@notatallshaw
Copy link
Member Author

An abstraction would be needed in the provider instead so resolvelib does not depend on an interface on the client side (this is why the provider interface exists). Otherwise the general idea seems reasonable.

You mean the resolvelib AbstractProvider should have a narrow_causes method? If so yes, I agree, I just hadn't got round to adding that in my experimental branch.

@ofek
Copy link
Contributor

ofek commented Oct 20, 2023

Has there been any update on this?

@pradyunsg
Copy link
Member

pradyunsg commented Oct 20, 2023

No, and there's no need to ask about that here. If there's an update, a comment will be posted here or a cross link will be added by Github.

@ofek
Copy link
Contributor

ofek commented Oct 20, 2023

Sorry for the confusion but I wasn't asking maintainers, rather specifically @notatallshaw about their progress on the experiment.

@notatallshaw
Copy link
Member Author

I haven't had time to clean up my approach and submit the PR to resolvelib. If there's a specific issue you are facing other than the already mentioned issue I would suggest creating a new issue so we can actually verify it's the same problem.

@pradyunsg pradyunsg added C: dependency resolution About choosing which dependencies to install resolution: deferred till PR Further discussion will happen when a PR is made and removed S: needs triage Issues/PRs that need to be triaged labels Oct 24, 2023
@notatallshaw
Copy link
Member Author

notatallshaw commented Oct 28, 2023

This is a note to myself more than anything, this narrowing causes approach can expose bugs in the resolution algorithm in an unexpected way, i.e. the resolution finishes and there are no causes to present, without exception handeling it emits:

ERROR: Exception:
Traceback (most recent call last):
  File "pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pip/_vendor/resolvelib/resolvers.py", line 439, in resolve
    raise ResolutionImpossible(self.state.backtrack_causes)
pip._vendor.resolvelib.resolvers.ResolutionImpossible: []

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "pip/_internal/cli/req_command.py", line 245, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "pip/_internal/resolution/resolvelib/resolver.py", line 100, in resolve
    error = self.factory.get_installation_error(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pip/_internal/resolution/resolvelib/factory.py", line 696, in get_installation_error
    assert e.causes, "Installation error reported with no cause"
AssertionError: Installation error reported with no cause

When this is eventually added to Pip this logic should instead emit an error to the user, probably informing them to post an issue to Pip github and maybe make a premade issue so new issues aren't constantly created if it turns out there's a common case of Pip incorrectly reporting resolution impossible.

@notatallshaw
Copy link
Member Author

After working on this for a little bit I realized it needs to be changed from "Cause Narrowing" to "Prefer Conflicts", it's largely the same idea, the provider understands what causes conflict with each other and resolvelib doesn't and the causes that conflict should be preffered.

I will update this once I have a PR ready and explain how it works.

@pradyunsg
Copy link
Member

@notatallshaw FYI: https://github.com/pradyunsg/pip-resolver-benchmarks is a thing that I've prototyped last weekend. It might be useful for throwing iterations of pip's resolver at specific resolution graphs.

@notatallshaw
Copy link
Member Author

@notatallshaw FYI: https://github.com/pradyunsg/pip-resolver-benchmarks is a thing that I've prototyped last weekend. It might be useful for throwing iterations of pip's resolver at specific resolution graphs.

This looks amazing, will definitely give it a try and post some feedback over there.

@notatallshaw
Copy link
Member Author

Small update, I've created the initial PR for resolvelib sarugaku/resolvelib#145 (although it appears I need to do some work with tests).

I've created an experimental Pip branch that implements prefering conflict causes here: https://github.com/notatallshaw/pip/tree/prefer-conflicting-causes

I would like to try this pip resolver benchmark tool, but I don't fully understand how it works yet, getting error with example benchmark: pradyunsg/pip-resolver-benchmarks#6

@notatallshaw
Copy link
Member Author

@pradyunsg thanks for fixing the benchmark, what it's shown so far is this doesn't help out for most situations. But thanks to being able to make other optimizations once sarugaku/resolvelib#145 lands there's still a minor performance benefit.

@notatallshaw
Copy link
Member Author

I have created a draft PR on Pip side to help push things along: #12459

There's work to do still, but hopefully this can spur things along.

@notatallshaw notatallshaw changed the title Implement a Causes Narrowing method for Resolution Implement preferring closest conflicting causes for backtracking resolution Jan 5, 2024
@notatallshaw
Copy link
Member Author

notatallshaw commented Jan 5, 2024

I've updated the issue description. It occured to me that a better way to think about this is preferring the closest conflicting causes. I.e.:

  1. Prefer causes first where their specifiers conflict with each other
  2. Then prefer causes that conflict directly with the parents of other causes
  3. Then prefer causes that conflict directly with the grandparents of other causes
  4. etc.

This fits perfectly with how someone might intuitively backtrack through a DFS algorithm if doing it by hand.

The PR I've raised (#12459) only implements 1 and 2 at the moment, I need to have a think if there's a way to implement 3 and 4. But in general implementing any number of the steps is going to have a massive improvement in backtracking when hitting situations where there are more than two causes and otherwises chooses something else than steps 1 or 2 when those steps could have been followed.

@notatallshaw
Copy link
Member Author

Broken this into two clean seperate issues: #12497 and #12498

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: dependency resolution About choosing which dependencies to install resolution: deferred till PR Further discussion will happen when a PR is made type: feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants