-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debundling, Technical Debt and Responsibilities #9677
Comments
❤️ ccing also @encukou @torsava @frenzymadness @stratakis from our team |
We (Arch) consistently debundle both pip and setuptools, so the referenced issue is not a problem for us. Actually, we have seen that issue and consider it a setuptools issue...
I don't see any rationale for devendoring pip, but not devendoring setuptools... is their packaging policy not consistent? Why is it the pip project's responsibility to add code specifically for the case where debian politics prevents timely coordination across packages? I would prefer pip to work with distros doing devendoring, when it is the technically correct thing for pip to have the solution. Off the top of my head, I don't particularly remember any problems Arch has had in that regard, other than issues and updates to I would prefer if Debian were to solve internal Debian bureaucracy by handling it via Debian bureaucratic channels (e.g. a bug report to the setuptools package to devendor setuptools, which is then a blocker before they can devendor pip), rather than trying to solve it by patching pip for a Debian-specific need that does not result in elegant code, and apparently increasing the maintenance burden of pip. :( |
@eli-schwartz the PR referenced is for an issue that occurs only on Arch (AFAICT): #9348 If this were specific to Debian, I'd have reached out to the Debian maintainers directly. :) |
As far as I can tell, that too is because of inconsistent devendoring, which we don't do... But in that case, mixing a distro pip and a user installed setuptools breaks. This is related to pypa/setuptools#1383 and more generally to the golden rule "do not overwrite/override the distro packages with your own, it will break other distro packages". But originally I only read the PR, not the issue. :) |
If you could also elaborate on this, that'd be great! |
@pradyunsg: Thanks for starting this discussion.
That's not my expectation. I assume you don't test it. And I see it as Debian's responsibility to help keep this mechanism working, as long as we're taking advantage of it. I am prepared to spend several hours scratching my head and debugging those problems, every now and then. I expect that there may be push-back on implementation details of those patches, but not to have to re-litigate the existence of the debundling support.
In the case of #9686, you're not wrong with that description. Finding the cause of the debundling issues that lead to it took a few days (on and off). I see @kitterma was previously aware of that issue years ago, but if I was, I'd forgotten all about it... However, these are fairly minor bugs, once understood. The Jenga tower stands fairly well...
Debian Policy: https://www.debian.org/doc/debian-policy/ch-source.html#embedded-code-copies What are the rationales behind it? Part if it is the definition of a distribution. We collect together software and try to fashion it into a cohesive system. The fewer copies of each thing that's in it, the easier our job is, and the smaller and simpler the system is. That's our security team's view too. They want to minimize the amount of work to do in response to issues. If we need to patch something in a library, it's really annoying to have to do it in multiple copies of that library. Especially if different people are responsible for maintaining each one. Hopefully that's not necessary, of course. When a project is targetting a specific version of a library that is out of date and we want to replace it, we'll (often) help them to port their software to the new version. A big distro is full of many dead & semi-dead upstreams, so there is an endless amount of this work to do. (That can be a good hint that a project has died and it's package should be removed.) Of course, every policy of course has exceptions. Here's the documented list of known embedded copies in Debian, that's far from complete. Basically, we debundle, where we can. When we can't, we may disable the relevant features, or just carry the embedded copy (distastefully). For web browsers, for example, in practice we have to carry embedded copies of several dependencies to support our stable releases. That's obviously the right trade-off to make in that situation. Maybe we should be doing the same thing with pip, now that the Python distribution space is evolving faster. Pip is pretty much a user-facing leaf package (not a library being used by other packages). So, we could, be shipping updated pip to stable Debian releases. And using bundled dependencies does make that very easy. This is certainly a conversation we can have. Stable distributions are trying to offer stability, by changing as little as possible. That usually means not shipping updated versions of software, because the updates bring unexpected change. When a project has stabilised enough, and has the testing and engineering resources to reliably produce stable versions that work across a range of platforms, and won't cause too many behavioural regressions, then shipping updates becomes feasible. It's all a matter of weighing risks to our users.
I don't know how much space there is there. Distributors can carry the cost of writing patches, but they'll need review. And there may be knock on technical debt. I'm not aware of much of that in pip related to debundling, but maybe you can educate me on that. Supporting debundled use (where the versions may not be exactly what you expect) encourages you to keep libraries at more arms-length from each other, rather than tight integration. I see that as good library design.
I think the situation is pretty good. pip supports a common use-case for distributor modification directly upstream. We could take that one step further with upstream CI.
💯
From my PoV my PR makes pip more robust when devendored. It's not pip's responsibility to take it, but if pip wants it, I'm offering it. I like to be able to carry the smallest patch-set possible in Debian, it makes makes our life easier, and means less chance for users to experience something different to what the upstream expects. As distributors we are trying to serve both users and upstreams. |
IIRC, we had CI for debundling at some point in the past and we removed it after some discussion. The basic line of reasoning there was (1) our CI setup has wayyyy too many long-ish jobs already (2) that CI can only cover one aspect of the situation and it cannot ensure all the potential combinations of patched/unpatched pip/pip's dependencies work. (I appreciate the responses here; but I can't respond to those yet because I have to make and eat breakfast) |
Coming purely from the technical side, I wonder if it’s possible to configure Mypy to check for this. Maybe a conditional import to tell |
It's not |
Yes, and I’m thinking maybe it’s possible to “fake” that mismatch for the mypy check. |
For mypy, they both come from the exact same place: |
The mismatch should be capable of being recreated by using one copy of the Since that is how I originally detected the mismatch. |
I will say that while I generally lean on the side of wishing downstream would not debundle us, I do think it's not entirely a negative thing either. By having at least some downstream users that debundle, it sort of forces us to "stay honest". By that I mean, I've seen time and time again that projects that bundle some library often times end up patching that library locally for one reason or another. Often times they start out assuming no local patches, but over time they get added, and eventually the bundled couple effectively ends up being a fork of upstream. Having downstreams that debundle forced us to come up with a bundling solution that goes out of it's way to try and avoid that from happening, and provide a constant sort of pressure to help ensure that we don't regress in that aspect. Maybe we don't need that pressure and it would be fine without it, but I do want to recognize that it does have at least some positive for upstream here. That being said, I do think there are two general types of bugs that can flow from out debundling support. Broadly that's bugs that are inherent to it (for example, if we miss a vendoring alias) and bugs that are specific to something a downstream distro is doing (like only partially debundling, triggering a mismatched type problem). The first of those types of bugs I think are obviously things we should land patches for in pip itself. It would be silly to expect every downstream to carry the same patch to fix some inherent problem in our debundling. Since we don't actually test our debundling, and we rely on downstream to do that (which is effectively the trade off we made here, we'll develop and carry this system of debundling, but we're pushing the costs of testing onto the downstream) so we'll often times only going to see those issues as reports (and hopefully patches) from downstream. The second of those types of bugs I kind of mentally view them similar to things like patches that fix pip on obscure operating systems. We're unlikely to do the work to fix those problems or debug them, we're not going to add CI to ensure that it stays fixed, but if the patch itself isn't going to cause some serious regression and it's already written, then there is little reason to avoid pressing the merge button from our POV. Downstream might prefer to carry that patch themselves, since it's going to be more durable in that case (as was pointed out earlier, it's pretty easy for random work arounds to code paths not tested in CI to break), but they also might prefer to just land it in pip to avoid having to rebase their patch regularly. I think either option is fine. In general though, I don't think the pip maintainers need to worry too much about fixing issues caused by specific downstream decisions that are not inherent to our debundling support. One of the reasons I made that support require downstreams to explicitly patch pip, was somewhat as a signal that by doing this, there's a chance you might have to carry patches to make it fully work. |
I agree with pretty much everything @dstufft said, with the minor exception of a "human nature" qualification on one point:
The reservation I have here is that if we don't push back on PRs that patch over obscure debundling problems that are outside what we'd consider the "norm", then that sets an expectation that we are willing to co-ordinate and manage the set of fixes that ends up in pip. And worse still, it leaves us open to the possibility that distribution A offers a PR that fixes their use case, but breaks distribution B. Who catches that problem? In reality, we don't actually have that difficulty, but it's hard to know for sure to what extent that's because we're relatively conservative in what we accept. I guess as long as no-one wants us to start being more open to accepting fixes for debundling issues than we already are¹, then there's no problem. ¹ Note for context that the fix in #9467 has been merged. |
On Fedora and RHEL we don't debundle pip so far, so that issue wouldn't affect us much, but this is the case possibly because we never looked at debundling it. We lean towards debundling as a distribution though wherever possible despite the fact that sometimes it can be a bit of a hassle (as in the case of pipenv). The reasons are nicely explained here: https://fedoraproject.org/wiki/Bundled_Libraries?rd=Packaging:Bundled_Libraries However not debundling pip has caused us problems in the past, especially with libraries like urllib3 which bundle other packages as well, when e.g. we have to backport a CVE fix. |
Let me answer a bit more generally than what you asked for. Debundling helps the things that distros do:
These are much more visible in the more "enterprise"/LTS distros, and with packages that aren't maintained as enthusiastically as pip.
Yes. Just say no; you set the rules. And as Donald says with "stay honest": I recommend to never give in to the temptation to fork/patch that bundled code. Otherwise you become maintainers of a fork, which is a whole new level of technical debt. All in all, I really hope that as distros, we're helping the project. Just in different ways than developers. |
Thank you. This is useful input.
One frustration here is that pip, having so few maintainers, really isn't able to address the sort of "enterprise" concerns that distros do. That's fine as long as the distros cover this, but when these concerns spill over onto pip, it can get difficult. Particularly as the distros get the enterprise license fees and funding, and we don't...
Thank you for that. It means a lot to get that support.
Mostly, yes you are. Policy and priority clashes can be frustrating, and I won't lie, it's hard to be sympathetic when we get a bunch of users saying "pip is broken" but the reality is that what's broken is something the distro did. Some distros trigger more of these than others, and from an outsider's POV it can be hard to understand why that is, or why distros can be so different in this regard. But in general it's a net positive, yes. |
Yes.
Don't be ashamed to reassign the issue to the distros (however that might work). I definitely want to know about all the problems Fedora/RHEL is causing. |
We do, but only by saying "you need to talk to your distro" (which is all we know) and often it feels like the user has no clue how to do that, which is frustrating to us because the user reached out to us and we weren't able to help. Hmm, one thought about how we could point people in the right direction more easily. Maybe we could get a list of the correct support URLs for all distros that debundle pip, and add them to the pip docs. We could also require that distros that debundle add a note to the pip version string saying "patched by XXX" (it would even be easy enough to add a check to pip so that we fail if we're debundled but the note is missing). Then it would be obvious from One downside is that it may result in distros getting a whole load of pip issues that aren't related to debundling, but honestly I doubt that, I don't think many users do that much analysis before raising a bug (I know I don't 🙂). |
Speaking for myself, it is generally helpful if you ping people directly. Maybe pip could maintain a list of distro maintainers, maybe even other volunteers, willing to provide support directly in the bugtracker? I really think closing the communication gap between pip maintainers and distros is the best path forward. |
If the issue isn't with pip, I'd really rather it got moved onto a distro tracker, and not have the discussion stay on the pip tracker (I have enough trouble already with too many pip notifications). I'm completely in agreement that better communication between distros and pip maintainers is good, but I also think that helping users to understand who is best placed to help them is good - and "@FFY00 on the pip tracker" looks to a user like a pip specialist, not a distro specialist, which IMO makes it harder to educate users. |
I think I’d prefer to have a blob of text that I can copy paste for saying “this seems to be due to XYZ distro’s changes. Here’s what you need to do to reach out to them”. Right now, we’re missing next steps guidance for the user, because we don’t know what they need to do to reach your communication channels. If there’s a place we should send them to, that’s appropriate for you, it’d be great if you just provide it here. I’ll add that into the maintainer documentation when I come around to finishing our documentation rewrite. :) |
Sadly Debian's bug-reporting process is not very beginner-friendly (unless your beard is sufficiently grey and wispy, and you appreciate being able to file bugs with properly formatted text emails). Ubuntu's is less arcane and more web-based. But here are Debian's bug reporting instructions: https://www.debian.org/Bugs/Reporting How about this for Debian:
Ubuntu:
|
Fedora, RHEL, CentOS (and probably other derivatives – Rocky, Scientific, CloudLinux etc. – if they don't tell you something more specific):
|
Thanks for the discussion everyone. I think the next steps here are for someone to file a PR aggregating the comments into a dev-docs page to copy from. This will likely need to be one of the pip maintainers, since we’d want to word things carefully there. |
FTR Stefano Rivera sent this message about stopping the de-bundling in Debian: https://lists.debian.org/debian-python/2021/09/msg00031.html |
@eli-schwartz @FFY00 Do you want to share an equivalent blurb for the distros you're involved with, as was shared above for Fedora+"friends" and Debian+"friends"? |
We will almost certainly keep debundling in arch. There are some issues that we need to be careful, mainly making sure we don't update dependencies to incompatible versions and stuff like that. Perhaps @felixonmars could elaborate a bit more, since he is the one that currently maintains the pip package. |
That's not what I'm asking. See #9677 (comment) for what I'm asking for here. |
Ah, sorry! Just point the users to the following URL, to create a new issue in our bug tracker. |
Ok, I think Arch is the last remaining holdout on debundling. And, your current approach is causing a significantly degraded/fragile experience when using the Arch-provided pip: https://twitter.com/jpetazzo/status/1556594507952553984 |
And... @dvzrv removed Arch's patch to debundle pip in the latest release of python-pip (22.2.2-2, love this version number). More context in #11411. Closing this out, since... uhm... Looks like every distro that we've seen substantial reports from, in the past, has stopped debundling pip. If you do end up going down the road of debundling in the future, please stick to the description in the policy (specifically, the bit about making sure stuff doesn't break). Also, please feel welcome to reach out to us (directly over email, via an issue here, over IRC, the PyPA Discord or on discuss.python.org's Packaging category) if you see something that you'd like our input on. |
Although we have stopped debundling, also with that we do have issues: e.g. with the bundled certifi, which bundles a specific certificate, which we usually point at our system-wide certificate setup (one place to configure things is great). |
pip also patches certifi so it’d probably be fairly reliable to patch pip’s bundled certifi. |
This is an offshoot from #9467. Go ahead and read the most recent few comments for context.
/cc @FFY00 @eli-schwartz @stefanor @doko42 @kitterma @hroncok to put this on your radar. Please feel free to add in other redistributors of pip, who might be interested in this discussion.
/cc @pypa/pip-committers to call me out if I say something wrong here.
First off, I know folks probably don't say this enough, but: hey, thanks for doing what you do. I genuinely do appreciate the work you do, and it's quite a thankless task to keep OSS things functioning. So, hey, thank you!
I've seen it stated in multiple places, by multiple different people now, that because pip has a mechanism to simplify debundling, it should work when debundled as is.
Please don't take the fact that the debundling script exists in this repository, to mean that it is somehow pip maintainers' responsibility to ensure that it'll give you something that just works. If that's the expectation being set, I'd like to remove that script from this repo, and make it clearer that it's not something that we want to deal with.
As far as I can tell, the debundling script exists because... how do I phrase this diplomatically... it was very easy for redistributors to debundle pip, get it kinda-sorta working and ship a broken pip. It is there to make a redistributor's life easier when they're debundling pip, but that can not be at the cost of making pip's maintainers' lives harder.
Quoting from our vendoring policy's debundling-script-is-here section:
The "bit of extra work" is making sure that the debundled pip doesn't fail in any of the ways that our users use it. (I'm saying "our users" because these are users of both pip and $thing)
For years now, we've effectively been saying "don't do that please, because it'll break things. And if you do it anyway, please make sure it works OK because we don't have the resources to test all the ways it can break for you.".
And then, redistributors and said "yea, so... it broke because we didn't account for $thing, but if you do this one thing in this specific weird way, our jenga tower stays upright". And we've been accepting (even authoring) such patches because it makes our users' lives easier.
It'd be very easy for pip's maintainers to point to our vendoring policy, and start saying no to patches intended to make things work when pip is debundled (and to revert the ones we've merged over the years already). We're not doing that, but I'd like to get to a point where pip's maintainers don't have to worry about the issues caused by the debundling of pip.
Honestly, pip's maintainers can't be fixing / dealing with these issues - that's literally the point of that policy document. pip getting even more changes and weird type conversions, to accommodate for the various breakages due to debundling is NOT the solution here. It's a fragile workaround, can easily be refactored and makes it more difficult to maintain this code in general. It's technical debt for pip, in exchange for avoiding additional work for downstream redistributors. We've been taking this on for a while but I'd really like to stop doing that.
Pip's1 been taking on technical debt for something that we've explicitly documented that we don't want to be taking technical debt for. We are going to have to start saying no to these patches at some point and I'm sure everyone would prefer it wasn't a thing we started doing on a random day magically. :)
1 I hope you're smiling Paul.
So... I do have a few questions now.
For the redistributors who are debundling pip, could you share with us the line of reasoning that leads you to decide that you have to be debundling pip? Feel free to point to specific sections in bunch of documents. (I promise to not get into the discussion of the merits of those choices, I just wanna know what they are to better understand your situation)
For everyone:
(PS: typed with typo-happy thumbs, on a phone)
(PPS: this is definitely a "if I had more time, I'd have written a shorter letter" situation)
The text was updated successfully, but these errors were encountered: