-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle dependency versions? #186
Comments
(Disclosure: I discussed this with Dan prior to this write-up.) Thanks for doing the research on this! Seems pretty reasonable to me. poetry.lock has certainly been annoying me and it does sound like poetry may not be the right thing to use for a library. I'm keen to hear what @valedan or @neelnanda-io think as well. |
Thanks a lot for the really thorough investigation! I don't have the slack to engage with this properly right now, but your arguments seem reasonable, and I'd be very happy to take a pull request for whatever either of you think is reasonable re version control. I mostly deferred to @alan-cooney on using Poetry, who may have takes (I really don't understand Python libraries and version/dependency control very well) |
@jbloomAus I do agree that Poetry may not be the best choice for a library, but I don't have a strong opinion on what should be used instead because the packaging ecosystem is so fragmented. I'd lean towards either Hatch, since it's the "modern" all-in-one tool adopted by the PyPA, or perhaps just sticking with pip and setuptools since that way a new TransformerLens contributor won't need to install anything extra to get the project set up locally. I think it's also fine to stay with Poetry if it's generally working well, as long as we fix the issues I identified above. I'd definitely like to hear @alan-cooney 's thoughts on this - I don't have a ton of Python experience so it's very possible I'm missing something important! |
Cool link: https://github.com/pypa |
I just got hit by this! Sharing as a datapoint in favour of @valedan's arguments. scenarioI tried to install the steps taken❯ poetry add wandb
Using version ^0.14.0 for wandb
Updating dependencies
Resolving dependencies... (0.5s)
[BUG] transformer-lens (0.0.0) @ git+https://github.com/neelnanda-io/TransformerLens.git@HEAD is not satisfied. investigationTransformerLens pyproject.toml specifies ❯ ~/code/TransformerLens
❯ grep wandb pyproject.toml
wandb = "^0.13.5" The ❯ curl https://files.pythonhosted.org/packages/ab/0f/f1a45388f68ca1f5e72c82feb60d77066bb3ac19b0b2d8b13e494cacf0bf/transformer_lens-1.1.1.tar.gz -o transformer_lens-1.1.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 77818 100 77818 0 0 307k 0 --:--:-- --:--:-- --:--:-- 315k
❯ tar -xf transformer_lens-1.1.1.tar.gz
❯ grep wandb transformer_lens-1.1.1/setup.py
'wandb>=0.13.5,<0.14.0'] |
Hi @valedan - thanks for the detailed issue. Also apologies for the delay here - I've been away for a week. Caret versionsGenerally agreed that defaulting to Python dependency management is more generally a bit messy (as you touch on, it doesn't use nested/tree structures which solve all these issues). I understand this is a relic of the past (from when compute was cheaper than storage), but I'm not aware of any plans to change it. This is also why semver doesn't fully work with Python, whereas it does work really well with most other languages. More generally it may be worth switching away from Poetry, as most people are only familiar with Setuptools (but note the CD will need to be updated as well). The big plus compared to Setuptools is that it checks compatibility across all package versions within the range specified (not just the ones you happen to have installed), which prevents a whole different class of common problems in libraries. For me the biggest downside is it's significantly slower during development as a result (and this could become prohibitive). No strong views here, and happy to hear of any better alternatives. LockfilesLockfiles have pros and cons. They are there to enable consistent development environment setups (which makes collaboration easy), and to prevent fragile tests/CD (e.g. if a test passes locally, it should also pass on the CD). They also allow efficient caching of modules in the CD, which significantly speeds up the tests. The downside is we're not testing against the latest versions of third party libraries, but the flip side of this is that if it's changed, we're then not testing against older versions that we're trying to support. On balance I think they're worth it. Lower boundsI've already tried to do this. I suggest we just add (and test) a lower bound if it becomes needed. |
I've decided to summarise this discussion so we can work out what actions to take. Thankyou GPT4!
I'm going to have a go at replacinging the Caret requirements with >= but will stop short of changing the fact that we use poetry or currently store the lockfile. I'll see what happens with the >= changes on the current CI. If future package versions cause problems, we might have problems but hopefully this is a worthwhile risk. |
Closing issue, will reopen if this comes up again. |
Thanks for weighing in @alan-cooney. And thanks for taking action on this @jbloomAus! I planned to do this myself a while back but I've been busier than expected with maze-transformer in the past couple of weeks. |
@valedan this python packaging repo has been trending recently: https://github.com/mitsuhiko/rye |
Having spoken to others - a common approach here is actually to use Dependabot. The way it works is:
This has all the advantages of supporting new versions where possible, without the disadvantage of potentially showing dependency versions that aren't supported. Rye looks interesting, but its unclear as to what it adds over Poetry. The underlying issues with python (especially no tree/nested structure of dependencies) and also PyPi (metadata not required) are unsolved. |
Background - poetry has bad default behaviour for libraries
Some of Poetry's default behaviour is good for applications, but terrible for libraries. The big issue is that any time you run
poetry add foo
, it will addfoo
topyproject.toml
with a caret requirement. This is equivalent to setting a lower bound of whatever version of the dependency is newest at the time of addition, and an upper bound of the next major version.The idea is that if everyone is following semver, upgrading minor versions should be safe but upgrading major versions might break things. This is a bit questionable even for applications, as it relies on everyone perfectly following semver and never introducing bugs in minor version bumps, but it's not a big deal.
However this behaviour is terrible for libraries. The important difference between an application and a library is that as a library author, you have no control over what other packages a user will install alongside your library. This makes it important to be compatible with as wide a range of other libraries as possible. Setting version upper bounds breaks this - if TransformerLens depends on
foo ^1.2.3
and some other cool interpretability library depends onfoo ^2.0.1
, a user cannot install both libraries in a project because they have incompatible dependencies.Python packaging expert Brett Cannon says:
Current dependency versions
Right now TransformerLens is using carat requirements for ALL dependencies in
pyproject.toml
. I'm not sure if this has broken things for any users yet, but it's only a matter of time before it does.Lockfiles
A related issue is with lockfiles. For the same reasons it's bad for libraries to set upper bounds for dependency versions, it's also bad for them to use lockfiles because a lockfile will pin exact versions of everything. Poetry avoids this problem by not using the lockfile when the project is installed as a library. This means that the
poetry.lock
in this repo will have no effect on what dependency versions get installed by users of TransformerLens, but it will ensure that all TransformerLens developers are on the same dependency versions when working on it locally. Additionally, the CI/CD workflows for TransformerLens use the lock file to install dependencies.This lock file usage means that if a new version of one of the TransformerLens dependencies has a bug that breaks things, the maintainers of TransformerLens will be the last ones to find out. They're using lock files when developing locally or testing in CI, while users will be getting the latest dependency versions whenever they install TransformerLens for a new project.
The options here are to either remove the lockfiles from version control, or to be very diligent about frequently updating them to the latest dependency versions. Removing the lock file means TransformerLens maintainers will occasionally be annoyed because CI breaks due to a change in a new dependency version - but if this happens it means that users of TransformerLens will have the same problem, so it's good for maintainers to find out and fix it ASAP.
Proposed changes
datasets = "^2.7.1"
todatasets = ">=2.7.1"
. Then test TransformerLens with everything on the latest version. This will solve most of the potential compatibility issues. After this fix, everyone needs to be careful with future dependency additions. After usingpoetry add
make sure to change the^
to a>=
inpyproject.toml
. It may help to add a comment explaining this inpyproject.toml
.poetry.lock
from version control so that dev and CI environments are in line with user environments. The alternative would be updating the lockfile soon after any dependency is updated, but this is a time consuming chore that would be hard to do consistently, and I'm unsure of the benefit.I can do the first 2 items myself in the next few days. I just want to get feedback from the maintainers before proceeding.
Misc
What about NPM?
Anyone who has spent a lot of time working in the javascript ecosystem (like myself) may be confused by this issue, because NPM also does carat requirements by default for everything. This actually works fine because javascript supports nested transitive dependencies, so if 2 libraries each require incompatible versions of
foo
, then NPM will just install 2 different versions offoo
in node_modules and everything works fine. But Python has a flat dependency structure and can only ever have 1 version of each package installed.Should TransformerLens continue using Poetry?
I've spent the past few days trying to wrap my head around the Python packaging ecosystem. In short - it's a huge mess. I was expecting to find a standard recommended set of tooling for library developers, but there are dozens of different tools with overlapping functionality and very little agreement in the community (or the official docs) about what should be used. Poetry has a few issues - overall it is geared more towards application development than library development, which leads to the issues discussed above. It also is not compliant with several PEP specifications, eg it uses a non-standard format in
pyproject.toml
. But overall it's a pretty nice library that streamlines a lot of things, and just based on the issues I've identified above I don't think there's a strong need to switch right now. If we keep running into more problems with Poetry in the future it might be worth considering - in that case Hatch seems like a good option.Further reading
Some good blog posts discussing the issues with library version constraints in Python:
Discussion of appropriate lockfile strategy for libraries on the Poetry repo - python-poetry/poetry#7488
The text was updated successfully, but these errors were encountered: