Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to handle VCS requirements with Python lockfiles and --hash #13965

Closed
Eric-Arellano opened this issue Dec 22, 2021 · 10 comments
Closed

Comments

@Eric-Arellano
Copy link
Contributor

See pypa/pip#6469. Pip is all-or-nothing with --hash. If one entry in the requirements.txt has it, then everything must have it in the file.

ansicolors @ git+https://github.com/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0
astroid==2.8.6 \
    --hash=sha256:cd8326b424c971e7d87678609cf6275d22028afd37d6ac59c16d47f1245882f6 \
    --hash=sha256:5f6f75e45f15290e73b56f9dfde95b4bf96382284cde406ef4203e928335a495
pex --no-transitive -r reqs.txt
ERROR: Can't verify hashes for these requirements because we don't have a way to hash version control repositories:
    ansicolors@ git+https://github.com/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0 from git+https://github.com/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0 (from -r reqs.txt (line 1))

The pip issue suggests switching VCS requirements to an Artifact URL like using the .zip file, but that doesn't always work and would require our users to have a confusing workaround.

We want to in general strongly encourage (require?) that you use --hash because it's important for supply chain attacks. So we can't give up on --hash entirely.

--

Instead, the only workaround I can think of is what the issue suggests: to use a distinct requirements.txt for VCS requirements, which doesn't use --hash, and then a normal one that uses --hash.

It's not clear how to model this: should generate-lockfile start generating 1+ lockfile per resolve? We could keep [python].resolves a mapping of resolve -> lockfile path, and rely on a standard suffix like default_lock.vcs_reqs.txt. That is, the extra file gets generated automatically and its name can't be changed. A major benefit of saving two lockfiles to disk is we get interop with the greater ecosystem, like you can pip install the files without Pants.

We could stick to using one lockfile, and then do some magical introspection of the lockfile to dynamically split it into two lockfiles during the rule. We don't have the interop benefit, though, and this is doing perhaps more work than actually necessary.

cc @stuhood @jsirois , thoughts?

@Eric-Arellano
Copy link
Contributor Author

Another alternative is to improve pip to support --hash=skip, as proposed at pypa/pip#6469 (comment). That would help several other projects, it seems, including Poetry and pip-compile.

@jsirois
Copy link
Contributor

jsirois commented Dec 22, 2021

Instead, the only workaround I can think of is what the issue suggests: to use a distinct requirements.txt for VCS requirements, which doesn't use --hash, and then a normal one that uses --hash.

How do you propose you perform a coherent resolve using two separate requirements lists? Afaict you can't without implementing your own resolver.

So, --hash checking mode is almost a non-feature. It seems like the easiest thing is to not pass Pip hashes at all and do the hash checking ourselves after pip download has run. Here us == Pex.

@Eric-Arellano
Copy link
Contributor Author

How do you propose you perform a coherent resolve using two separate requirements lists?

Agreed it does need to be a coherent resolve, only with the results split across two files. Perhaps pex3 lock export could do this splitting? Otherwise Pants could post-process the result of pex3 lock export and do the splitting.

So, --hash checking mode is almost a non-feature. It seems like the easiest thing is to not pass Pip hashes at all and do the hash checking ourselves after pip download has run. Here us == Pex.

Sure, that sounds good to me too. But how would we disable hashes for pip? There's only --require-hashes (implied by --hash in a -r file), nothing like --no-require-hashes. Would PEX pre-process the -r requirements.txt file to strip all hashes?

At that point, I wonder if implementing --hash=skip in pypa/pip#6469 (comment) would be less effort.

@jsirois
Copy link
Contributor

jsirois commented Dec 22, 2021

I don't understand your 1st resaponse, but I'll move on since I think it's moot.

Would PEX pre-process the -r requirements.txt file to strip all hashes?

Yes - it already does this. The pex/requirements.py module has implemented full Pip requirements file parsing for a good while now. Every Pex CLI run needs to parse requirements files, if passed, to get the full list of root level requirements for inclusion in PEX-INFO requirements.

@jsirois
Copy link
Contributor

jsirois commented Dec 22, 2021

What's confusing about that question though, is if you're using Pex lockfiles, you're not passing requirements.txt anyhow - Pex is reading its own lock file format and can clearly do whatever with that.

@Eric-Arellano
Copy link
Contributor Author

I don't understand your 1st response, but I'll move on since I think it's moot.

Oh, maybe this is the root of the confusion. I was thinking about the output of pex3 lock {create,export}, not the input. Turns out the input is an issue too: pex-tool/pex#1556

With the output, pex3 lock export normally returns a file with --hash for every entry. That wouldn't work if there was a VCS requirement because there is no sensible --hash to set. So, the produced requirements.txt-style lockfile would not be consumable by pex/pip because of incomplete hashes.

A workaround is for lock export to spit out two lockfiles: one with all hashed entries, another with all unhashed entries. Those two files must be used together to actually be valid.

I was speculating how that could be implemented, that lock create could output two files. Or if that's not possible, then Pants can post-process and split.

This output problem would still be relevant even if you move hash checking from pip to pex because the produced lockfiles would not be consumable outside of Pex, like if you run pip directly on the lockfile. It's valuable to Pantsbuild for it to work with greater ecosystem to facilitate incremental adoption. I think for the output problem, the best fix is to add --hash=skip to pip and have Pex's lockfile generation set that; next best is to spit out two lockfiles. Wdyt?

--

What's confusing about that question though, is if you're using Pex lockfiles, you're not passing requirements.txt anyhow - Pex is reading its own lock file format and can clearly do whatever with that.

That's presuming we use Pex's lockfile format, rather than running pex lock export. For now, #13962 uses pex export. But we can certainly decide we do want to save on disk Pex's format instead. Fwict, a major downside of that would be less interop with other ecosystems. A major upside would be richer metadata.

We could also only export with a new option generate-lockfiles --export, for example. Or hook into the new export goal Benjy added.

@jsirois
Copy link
Contributor

jsirois commented Dec 22, 2021

There is no need to run pex3 lock export. pex3 lock create creates a lock Pex inherently understands. Why middleman?

@Eric-Arellano
Copy link
Contributor Author

There is no need to run pex3 lock export. pex3 lock create creates a lock Pex inherently understands. Why middleman?

Let's have that convo at #13964 (comment) to keep this one focused on VCS requirements.

But that does sound good that if we go with PEX's format, then the output problem seems okay.

The input problem I think would only be that VCS requirements generally don't work yet with pex3 lock create: pex-tool/pex#1556. NB that Pants does not yet allow you to use --hash with a pants_requirement (#12090), so there's no issue with the input having some requirements that use --hash and some that don't. That's not possible.

@jsirois
Copy link
Contributor

jsirois commented Dec 23, 2021

So, I just went through the exercise of figuring out we don't handle this today - we map vcs requirements to version "*" for poetry to lock. That results in a lock, but of the latest publicly available version of the project and not of the requested vcs version. So, worse, its a silent failure to resolve what you asked for: #14020

I'm guessing you knew this though and so this is a new feature you're expecting out of a locker? The poetry locker, when given a proper git / rev, refuses to provide a hash via poetry export since the poetry.lock itself contains none (which I'm guessing is where your original example above comes from).

So, since the existing locker is broken for vcs requirements and the nascent Pex lock support is as well, there seem to be a host of options:
1.) Instead of silently not locking the requested requirement but instead ignoring the user and picking the latest publicly available version, like we do today, we could just refuse to lock vcs requirements: a loud failure is almost always better than a silent one.
2.) We could ask the new locker to externalize hash checking and actually record and check the hash of the cloned vcs repo which would work to make any form of vcs requirement reproducible-or-die; i.e.: They can write down a floating branch vcs url which will lock but then fail any resolves as soon as that branch gets a new commit - which is as things should be. The presumption here is that the hash of sdists and wheels built from the cloned repos will not generally be reproducible, just the contents of the repo from the clone will.
3.) We could ask the new locker to externalize hash checking and just skip ever recording or checking vcs requirement hashes.

@Eric-Arellano
Copy link
Contributor Author

Closing in favor of:

Repository owner moved this from Todo to Done in Python multiple user lockfiles Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants