Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolves should automatically include transitive targets, and not restrict 1st party source code #21194

Open
jasondamour opened this issue Jul 23, 2024 · 19 comments
Labels
backend: Python Python backend-related issues enhancement

Comments

@jasondamour
Copy link

jasondamour commented Jul 23, 2024

Problem

Migrating a monorepo with libraries and services with conflicting version constraints to pants requires so much effort compared to the intuitive pattern in poetry/pip.

For example, imagine the following repository:

flowchart LR
    X(3rd-party-lib)
    subgraph Libraries/
        libA
        libB
    end
    subgraph Services/
        service1
        service2
    end
    libA -.-> libB
    libB -.->|>=1| X
    service1 --> libA
    service2 --> libB
    service2 -.->|==2| X

Loading

Sample repo: https://github.com/jasondamour/pants-vs-poetry

This repository is valid in poetry. We can run poetry lock for each service:

  • service1: has all dependencies from libA, libB, and 3rd-party-lib==3.0 (latest version of 3rd-party-lib)
  • service2: has all dependencies from libB and 3rd-party-lib==2.0

All the same concepts should apply to python_requirements from requirements.txt and pep-621 compliant pyproject.toml files, I'm just using poetry as an example

For the equivalent in Pants, I need to:

  1. In pants.toml, define a resolve per service and library
  2. Parametrize targets to add ALL libraries to ALL resolves
    a. Pants struggles to infer dependencies when resolves are enabled, so it also requires a lot of manual dependencies

Solution

Resolves should (optionally) behave more like poetry per-module Virtual Environments.

  • A "resolve" should just represent version constraints for 3rd party dependencies (including transitive)
  • Don't impose restrictions on what 1st party code can run with a resolve
  • Don't require a lockfile for every resolve

For example:

  1. Define resolves:
# pants.toml
[python.resolves]
service1 = "Services/service1/pants.lock"
service2 = "Services/service2/pants.lock"
  1. Set Services/service1/BUILD: poetry_requirements(resolve="service1") to add direct targets of this directory to @resolve=service1 (and repeat for service2)
  2. Declare a dependency (same as vanilla poetry) services/service1/pyproject.toml: libA = {path = "../../Libraries/libA", develop = true}
  3. pants generate-lockfile --resolve=service1 should create a lockfile which pins libA, libB, and 3rd-party-lib
  4. pants test Libraries/libA:: should run without a resolve or lockfile

Describe alternatives you've considered

We may just go back to directly using poetry for all dependencies, and exporting lockfiles for services from poetry for pants to consume in tests. I think that will give us exactly what I'm describing.

@jasondamour jasondamour changed the title Resolves should automatically include transitive targets, like poetry/pip Resolves should automatically include transitive targets, and not restrict 1st party source code Jul 24, 2024
@huonw huonw added the backend: Python Python backend-related issues label Jul 24, 2024
@tdyas
Copy link
Contributor

tdyas commented Jul 24, 2024

How does Poetry behave if you use poetry lock to generate a poetry.lock lockfile and then try to use two different major versions of a dependency?

@jasondamour
Copy link
Author

How does Poetry behave if you use poetry lock to generate a poetry.lock lockfile and then try to use two different major versions of a dependency?

Can you clarify the scenario you're asking about?

@jasondamour
Copy link
Author

I've created a sample repo here: https://github.com/jasondamour/pants-vs-poetry

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

So in the Poetry equivalent, which lockfile is used when you run tests on, say, LibA?

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

When, say, packaging the services it's clear to me what you expect. I'm less sure about what the expected behavior is when you act on the libraries.

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

For context, this is very helpful for #20897

@jasondamour
Copy link
Author

jasondamour commented Jul 25, 2024

So in the Poetry equivalent, which lockfile is used when you run tests on, say, LibA?

It depends what project/directory you're acting from:

➜  poetry run --directory Libraries/libB python Libraries/libB/libb/main.py
flask version: 3.0.3

➜  poetry run --directory Services/service2/ python Libraries/libB/libb/main.py
flask version: 2.3.3

In pants-speak, I guess I would describe this as: using the resolve from the directly invoked target (top of the stacktrace), rather than the resolve of the transitive target?

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

I see. And if there are multiple directly invoked targets?

$ pants test Libraries/::

for example.

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

In the new python backend I plan to "partition by config", so if there were separate resolves it would be obvious which one pertains.

@jasondamour
Copy link
Author

IMO the above statement should expand to something like this:

$ pants test :: # this "expands" to:
pants test Libraries/libA/liba/main_test.py # this test (and all transitive deps) run with @resolve=libA
pants test Libraries/libB/libb/main_test.py # this test (and all transitive deps) run with @resolve=libB
pants test Services/service1/service1/main_test.py # this test (and all transitive deps) run with @resolve=service1
pants test Services/service2/service2/main_test.py # this test (and all transitive deps) run with @resolve=service2

but that would also require all python_sources to be added to all resolves, or resolves shouldn't include sources

In poetry's case specifically, it also feels like a resolve should be 1:1 with source roots. Where all python_requirements within a given source root are added to the same resolve

@benjyw
Copy link
Contributor

benjyw commented Jul 25, 2024

Yeah, that makes sense and is roughly along the lines I was thinking of for the new backend. I need to think about whether there is some way to emulate this in the existing backend, I'll play around in your repo a bit.

@jasondamour
Copy link
Author

Sounds good, I really appreciate it.

@benjyw
Copy link
Contributor

benjyw commented Jul 26, 2024

Hmm, I haven't come up with much. Maybe set enable_resolves = false when running tests, and letting Pants yolo it without a lockfile, and then use the service lockfiles when packaging? It's pretty lame, but you might get some mileage.

TBH this is not a use case that the original backend considered. It was focused more on keeping requirements coherent across a large "JVM-style" tightly-bound monorepo, rather than a collection of standalone projects that are loosely bound (e.g., consume each other as requirements at runtime) and happen to be in the same repo.

I am working on the new backend precisely to address this looser case, to make Pants less nannyish and let you do what you want rather than trying to keep you away from sharp edges.

Sorry I don't have better news except "wait for the new backend"... :(

Again, you may be able to do something by having multiple config files and switching between them depending on what you're doing. Lame, but might tide you over.

@benjyw
Copy link
Contributor

benjyw commented Jul 26, 2024

Meanwhile your feedback/participation in #20897 would be really useful. Linking to this issue and the repo from there would provide a real-world use case to kick the tires on.

@jasondamour
Copy link
Author

A new python backend (or pants v3) would be awesome, but is there any option achievable with the current backend? If there was an option where resolves would not be applied to source targets (any source could be used by any target), then I think that would do the job for us. Then "lockfiles" would actually work how they do in every other dependency management tool.

I'm willing to look into contributing this, if you could give some pointers

@jasondamour
Copy link
Author

@benjyw Based on this comment, it seems like there might be viable path forward. Pants could completely support/document the concept of None resolve, which is a special resolve (named None or something else like *), which does not generate a lockfile and can be used by any other resolve

I ran pants from source with the default resolve set to None directly in setup.py (to workaround the issues of declaring None in toml), and pants still treated None like a string in a lot of places. I'll keep playing with this

@jasondamour
Copy link
Author

Hm, "nullable resolves" alone wouldn't address the issue of inferring a different import owner based on which target is directly invoked.

So to get something working in the current python backend, one approach could be:

  1. A new [python-infer].ambiguity_resolution called by_resolve
    a. This looks for a dependency defined in the same resolve as the top-level target, then falls back onto by_source_root
    b. This might have very poor memoization performance, as the graph could be different for each top-level target
  2. [python].default_resolve defaults to None
    a. [python].enable_resolves defaults to true (could even be deprecated+removed, since the new behavior of defaulting targets to the 'None' resolve is equivalent to running without resolves)

Here's what this would look like in my sample repo:

# pants.toml
[python-infer]
ambiguity_resolution = "by_resolve"

[python.resolves]
service1 = "Services/service1/pants.lock"
service2 = "Services/service2/pants.lock"

Then all targets in Libraries/libA and Libraries/libB default to resolve=None. In each Services/service1/BUILD and Services/service2/BUILD, set the respective resolve, i.e.:

_defaults__(all=dict(resolve="service1))

Then, for pants test Services:: we should get the resulting dependency graphs:

  1. First, tests in Service1 where Flask is not defined in the service1 resolve
Services/service1/service1/main_test.py@resolve=service1 # This is the "top-level resolve" for this graph
-> Services/service1/service1/main.py@resolve=service1  
-> Libraries/libB/libb/main.py[@resolve=None] 
-> Libraries/libb:poetry#Flask
  1. Then, tests in Service2 where Flask is defined directly in the service2 resolve
Services/service2/service2/main_test.py@resolve=service2 # this is the "top-level resolve"
-> Services/service2/service2/main.py@resolve=service2
-> Libraries/libB/libb/main.py[@resolve=None]
-> Services/service2:poetry#Flask@resolve=service2 # since flask exists in the top-level resolve

@benjyw
Copy link
Contributor

benjyw commented Jul 27, 2024

A new ambiguity resolution method sounds pretty sensible for this, and if you can get that to work we'd welcome the PR. Let us know how it goes as you continue hacking on this.

@jasondamour
Copy link
Author

jasondamour commented Jul 29, 2024

I threw my POC into this PR, but I ran into some questions (like you mentioned) that I'm not able to answer:

  • If there are multiple directly invoked targets?
  • If a directly invokd target belongs to multiple resolves?

IMO this would require a new abstraction, like a "Pants Subproject" , where:

  • Files in a given directory always run with a certain resolve
  • Source Root patterns can default to [ '$PANTS_SUBPROJECT_NAME/', 'src/', 'test/' ] which is the poetry default (that would entirely remove the need for marker files for us)
  • Requirements directly in a subproject override transitive dependencies
    • this may be dependent on pants fully parsing transitive dependencies
  • Pants.toml configuration can be overridden per-subproject
    • ik theres discussion about enabling pants.toml configuration overrides at ANY directory level, but IMO thats potentially "too" powerful/flexible, and configuration per-subproject might offer a good compromise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues enhancement
Projects
None yet
Development

No branches or pull requests

4 participants