Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make most dependencies optional #1327

Closed
yifanmai opened this issue Jan 27, 2023 · 5 comments
Closed

Make most dependencies optional #1327

yifanmai opened this issue Jan 27, 2023 · 5 comments
Assignees
Labels
cleanup enhancement New feature or request p1 Priority 1 (Required for release) packaging proxy

Comments

@yifanmai
Copy link
Collaborator

yifanmai commented Jan 27, 2023

Problem

Some users of crfm-helm are downloading the library only to use the CRFM proxy client. However, crfm-helm has too many dependencies because it contains all of HELM, not just the CRFM proxy client. This means that installing it takes a very long time due to downloading the dependencies. This also increases the chance of dependency conflicts (e.g. user's code uses a newer or older version of PyTorch than what HELM uses). Anecdotal feedback from a proxy client user:

crfm-helm seems to request some older versions of libraries (like torch, transformers, pydantic). I’ve manually reverted these to the newer versions that I was using before installing crfm-helm...

Proposal

Package Splitting

We should split up HELM into three PyPI packages:

  • crfm-proxy (or crfm-helm-proxy or crfm-proxy-client?): This would contain almost everything under helm.common and helm.proxy. It should should be a minimal number of dependencies, mostly for JSON handling and HTTP requests. CRFM proxy API users would only need to install this package.
  • crfm-proxy-server (or crfm-helm-models or or crfm-helm-providers?): This would contain helm.services.server_service and helm.proxy.clients (which is confusingly named; this is where the models are implemented, not where the CRFM proxy client API lives). This would contain both ML dependencies (e.g. Hugging Face, PyTorch) and HTTP server dependencies (e.g. Bottle). This would depend on crfm-proxy.
  • crfm-helm: This would contain everything under helm.benchmark. This would contain only ML dependencies (e.g. Hugging Face, PyTorch). This would depend on crfm-proxy and possibly crfm-proxy-server.

Note that Python allows different packages under helm to live in different PyPI packages thanks to the namespace package functionality.

Fortunately for us, the current dependency tree already allows this splitting to happen quite easily (e.g. helm.common mostly only depends on itself, helm.proxy only depends helm.common, helm.benchmark depends on helm.proxy and helm.common) with a few exceptions:

Package Renaming

Optionally, we could rename helm.common and helm.proxy to crfm.common and crfm.proxy in the process, if we want to reserve the HELM brand for just the benchmarking project itself.

(Not) Repository Splitting

We should keep all packages in the same repository for now. The API is still experimental (i.e. semantic version v.0.x.x) and we making breaking changes fairly often. PRs will frequently need to modify more than one package. If the packages were in separate repositories, we would have to deal with version incompatibilities. If they're all in the same repository, we could simply pip install all the packages in edit mode, so we only need to ensure that the packages are compatible at the same git commit.

(Not) Install Extras

An alternative solution is to use install extras. This is more problematic than the above solution for a few reasons:

  • It would create a combinatorial number of different configurations, as each extra can be installed or not installed, and this creates more CI testing scenarios for us (e.g. we would need to test that the client works without any extras installed, and that the server works with extras installed).
  • We would need to detect and disable features when certain packages are not installed via some mechanism.
  • It would require adding lots of conditional important logic (i.e. check that a package is installed before doing something). Additionally, mypy type checking doesn't work well with conditional important logic (it usually considers a condition import to be a type error).
@percyliang
Copy link
Contributor

I think we should keep helm as part of the naming, since CRFM is broader (crfm.common claims too much).

From my perspective, I think the thing that mostly matters is to have a small number of things to install to get going with various use cases:

  1. Just want to run the proxy server (shouldn't have to load all the scenarios)
  2. Just want to implement a scenario (shouldn't have to install Pytorch and all the dependencies of other scenarios, etc.)

The thing I'm actually most worried about that would make HELM unwieldy is as the number of scenarios and metrics grows (especially with VHELM), that doing anything with a single scenario requires all the dependencies of all the scenarios (which can grow with the number of scenarios).

@nelson-liu
Copy link
Contributor

It would be really great to have this. I'm finding it very difficult to sanely include the helm package in existing projects where I want it mostly as a replacement for the OpenAI python client (so I can use CRFM token quota)---there are just way too many dependency conflicts with what's already in my environment (e.g., setup.cfg specifies torch~=1.12.1 when I want to use torch~=2.0, etc)

@yifanmai
Copy link
Collaborator Author

I'm planning on fixing it this month. Are there other conflicts or is it mainly torch?

@nelson-liu
Copy link
Contributor

torch and transformers were the main ones for me.

@yifanmai
Copy link
Collaborator Author

yifanmai commented Jun 20, 2023

I've back-paddled on the PyPI splitting idea.

I now think that that install extras / optional dependencies are the way to go, because:

  • I wasn't able to find any major Python open source projects that put multiple packages in the same repository.
  • The ObjectSpec system means that we can only load modules when needed, which in turn can load optional dependecies when they are imported.
  • mypy type checking can be pleased by putting an import in a try block.
  • Most of the optional dependencies are only used in one or two places, which makes it very easy for a user to tell which optional dependencies need to be installed.
  • Imports of optional dependencies in try... raise blocks can be used to display useful error messages and instructions.

@yifanmai yifanmai changed the title Split HELM into separate PyPI packages Make most dependencies optional Jun 20, 2023
@yifanmai yifanmai added p1 Priority 1 (Required for release) and removed p2 Priority 2 (Good to have for release) labels Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup enhancement New feature or request p1 Priority 1 (Required for release) packaging proxy
Projects
None yet
Development

No branches or pull requests

3 participants