Make most dependencies optional #1327

yifanmai · 2023-01-27T22:29:30Z

Problem

Some users of crfm-helm are downloading the library only to use the CRFM proxy client. However, crfm-helm has too many dependencies because it contains all of HELM, not just the CRFM proxy client. This means that installing it takes a very long time due to downloading the dependencies. This also increases the chance of dependency conflicts (e.g. user's code uses a newer or older version of PyTorch than what HELM uses). Anecdotal feedback from a proxy client user:

crfm-helm seems to request some older versions of libraries (like torch, transformers, pydantic). I’ve manually reverted these to the newer versions that I was using before installing crfm-helm...

Proposal

Package Splitting

We should split up HELM into three PyPI packages:

crfm-proxy (or crfm-helm-proxy or crfm-proxy-client?): This would contain almost everything under helm.common and helm.proxy. It should should be a minimal number of dependencies, mostly for JSON handling and HTTP requests. CRFM proxy API users would only need to install this package.
crfm-proxy-server (or crfm-helm-models or or crfm-helm-providers?): This would contain helm.services.server_service and helm.proxy.clients (which is confusingly named; this is where the models are implemented, not where the CRFM proxy client API lives). This would contain both ML dependencies (e.g. Hugging Face, PyTorch) and HTTP server dependencies (e.g. Bottle). This would depend on crfm-proxy.
crfm-helm: This would contain everything under helm.benchmark. This would contain only ML dependencies (e.g. Hugging Face, PyTorch). This would depend on crfm-proxy and possibly crfm-proxy-server.

Note that Python allows different packages under helm to live in different PyPI packages thanks to the namespace package functionality.

Fortunately for us, the current dependency tree already allows this splitting to happen quite easily (e.g. helm.common mostly only depends on itself, helm.proxy only depends helm.common, helm.benchmark depends on helm.proxy and helm.common) with a few exceptions:

We need to separate ServerService (which will live in crfm-proxy-server) from its interface Service (which will live in crfm-proxy).
Replace dacite with cattrs for JSON codec in run_display #1307 introduces a wrong-direction dependency (it makes the JSON converter aware of a class that requires special treatment). We need to reverse this (i.e. the JSON converter needs to expose an API that the class that requires special treament should call into) using the solution described in Subclass registration for Perturbation #1251.

Package Renaming

Optionally, we could rename helm.common and helm.proxy to crfm.common and crfm.proxy in the process, if we want to reserve the HELM brand for just the benchmarking project itself.

(Not) Repository Splitting

We should keep all packages in the same repository for now. The API is still experimental (i.e. semantic version v.0.x.x) and we making breaking changes fairly often. PRs will frequently need to modify more than one package. If the packages were in separate repositories, we would have to deal with version incompatibilities. If they're all in the same repository, we could simply pip install all the packages in edit mode, so we only need to ensure that the packages are compatible at the same git commit.

(Not) Install Extras

An alternative solution is to use install extras. This is more problematic than the above solution for a few reasons:

It would create a combinatorial number of different configurations, as each extra can be installed or not installed, and this creates more CI testing scenarios for us (e.g. we would need to test that the client works without any extras installed, and that the server works with extras installed).
We would need to detect and disable features when certain packages are not installed via some mechanism.
It would require adding lots of conditional important logic (i.e. check that a package is installed before doing something). Additionally, mypy type checking doesn't work well with conditional important logic (it usually considers a condition import to be a type error).

The text was updated successfully, but these errors were encountered:

percyliang · 2023-02-05T22:42:32Z

I think we should keep helm as part of the naming, since CRFM is broader (crfm.common claims too much).

From my perspective, I think the thing that mostly matters is to have a small number of things to install to get going with various use cases:

Just want to run the proxy server (shouldn't have to load all the scenarios)
Just want to implement a scenario (shouldn't have to install Pytorch and all the dependencies of other scenarios, etc.)

The thing I'm actually most worried about that would make HELM unwieldy is as the number of scenarios and metrics grows (especially with VHELM), that doing anything with a single scenario requires all the dependencies of all the scenarios (which can grow with the number of scenarios).

nelson-liu · 2023-06-13T06:08:44Z

It would be really great to have this. I'm finding it very difficult to sanely include the helm package in existing projects where I want it mostly as a replacement for the OpenAI python client (so I can use CRFM token quota)---there are just way too many dependency conflicts with what's already in my environment (e.g., setup.cfg specifies torch~=1.12.1 when I want to use torch~=2.0, etc)

yifanmai · 2023-06-13T16:25:07Z

I'm planning on fixing it this month. Are there other conflicts or is it mainly torch?

nelson-liu · 2023-06-14T00:11:15Z

torch and transformers were the main ones for me.

yifanmai · 2023-06-20T23:14:03Z

I've back-paddled on the PyPI splitting idea.

I now think that that install extras / optional dependencies are the way to go, because:

I wasn't able to find any major Python open source projects that put multiple packages in the same repository.
The ObjectSpec system means that we can only load modules when needed, which in turn can load optional dependecies when they are imported.
mypy type checking can be pleased by putting an import in a try block.
Most of the optional dependencies are only used in one or two places, which makes it very easy for a user to tell which optional dependencies need to be installed.
Imports of optional dependencies in try... raise blocks can be used to display useful error messages and instructions.

yifanmai added enhancement New feature or request p2 Priority 2 (Good to have for release) Code refactor labels Jan 27, 2023

yifanmai self-assigned this Jan 27, 2023

yifanmai added cleanup framework and removed Code refactor labels Jan 30, 2023

yifanmai mentioned this issue Feb 3, 2023

Split out CRFM API proxy client into its own library #1044

Closed

yifanmai added proxy packaging and removed framework labels Feb 3, 2023

yifanmai changed the title ~~Split HELM into separate PyPI packages~~ Make most dependencies optional Jun 20, 2023

This was referenced Jun 21, 2023

Lazily import client modules in AutoClient #1679

Merged

Use canonical fully qualified class names for metrics #1680

Merged

yifanmai added p1 Priority 1 (Required for release) and removed p2 Priority 2 (Good to have for release) labels Jun 29, 2023

yifanmai mentioned this issue Jun 29, 2023

Weird behavior when installing from source #1541

Closed

yifanmai mentioned this issue Aug 1, 2023

Architecture and data schema review #1753

Closed

This was referenced Sep 26, 2023

Optional dependencies #1798

Merged

Move model and misc dependencies to optional extras #1844

Merged

yifanmai closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make most dependencies optional #1327

Make most dependencies optional #1327

yifanmai commented Jan 27, 2023 •

edited

Loading

percyliang commented Feb 5, 2023

nelson-liu commented Jun 13, 2023

yifanmai commented Jun 13, 2023

nelson-liu commented Jun 14, 2023

yifanmai commented Jun 20, 2023 •

edited

Loading

Make most dependencies optional #1327

Make most dependencies optional #1327

Comments

yifanmai commented Jan 27, 2023 • edited Loading

Problem

Proposal

Package Splitting

Package Renaming

(Not) Repository Splitting

(Not) Install Extras

percyliang commented Feb 5, 2023

nelson-liu commented Jun 13, 2023

yifanmai commented Jun 13, 2023

nelson-liu commented Jun 14, 2023

yifanmai commented Jun 20, 2023 • edited Loading

yifanmai commented Jan 27, 2023 •

edited

Loading

yifanmai commented Jun 20, 2023 •

edited

Loading