feat!: Add extra for inference dependencies such as torch #5147

julian-risch · 2023-06-14T17:02:40Z

Related Issues

fixes Allow exclusion of torch GPU dependencies during installation #4233
fixes Extract dependencies from core list #4417
Users who want to just use PromptNode with OpenAI or with a model from transformers should not need to install torch and cuda dependencies, sentencepiece or the huggingface hub dependencies but at the moment it's a must.

Proposed Changes:

TLDR: pip install farm-haystack -> pip install farm-haystack[inference]. Users who ran the former in the past now need to run the latter to install the same set of dependencies.
This PR adds a new extra, for now called inference. ~~Let's discuss the name of this extra.~~
The inference extra contains torch, sentencepiece extras of transformers so that the versions that are compatible with transformers are installed. It also contains the huggingface hub dependency and sentence-transformers.
The standard Haystack installation will not install torch, sentencepiece, sentence-transformers, and huggingface hub but instead just install transformers without those extras. As a result users can run for example PromptNode with different API providers, such as OpenAI or Hugging Face inference endpoints and api inference.
Nice side effect: most of the integration tests become several minutes faster because they can skip installing torch.

How did you test it?

Manually tested running a PromptNode with OpenAI after installing from this branch. More tests with other model providers pending.

Notes for the reviewer

Pillow shouldn't be part of the core dependencies but for now it's required. Otherwise the import fails. Should be addressed in a separate PR.

haystack/haystack/nodes/retriever/multimodal/embedder.py

Line 8 in 7731713

from PIL import Image
It's a big breaking change for how Haystack is installed, for example in all tutorials, the docs, and the readme. Let's talk with @bilgeyucel and @TuanaCelik about how to roll this out before merging and let's doublecheck that an error message with simple instructions is shown if torch is not installed but required. Should be the case already thanks to feat: optional transformers #5101 .
I ~~still need to~~ checked whether it makes more sense to keep huggingface-hub as a core dependency
- huggingface_hub import is used only used very few times in the code base: https://github.com/search?q=repo%3Adeepset-ai%2Fhaystack%20_hub&type=code One use case is to push a finetuned reader model to the hub and the other use case is to check whether a model that shall be downloaded from the model hub is of type "sentence_transformers". In both cases a torch import is required early so there is no situation when a user reaches the code where huggingface_hub is used before the torch import fails. Therefore, huggingface-hub can be part of the inference extra.
~~Once we have agreed on the name of the extra, I will add a message to all the lazy torch_imports and torch_and_transformers_imports saying message="Run 'pip install farm-haystack[inference]'"~~ done

Alternatives

An alternative would be to also remove plain transformers (without extras) from Haystack's standard dependencies. I think we shouldn't do that. It further reduces the installation to a minimum and thus saves installation time and memory but transformers is required for many things users can do with Haystack and even a PromptNode with a transformers model requires it.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2023-06-19T13:08:25Z

Pull Request Test Coverage Report for Build 5314673912

0 of 0 changed or added relevant lines in 0 files are covered.
384 unchanged lines in 13 files lost coverage.
Overall coverage increased (+0.05%) to 42.314%

Files with Coverage Reduction	New Missed Lines	%
nodes/prompt/prompt_template.py	1	89.55%
agents/types.py	2	93.55%
preview/pipeline.py	3	90.91%
agents/base.py	7	94.38%
utils/openai_utils.py	7	81.32%
nodes/prompt/prompt_model.py	8	84.31%
nodes/answer_generator/openai.py	15	84.62%
nodes/prompt/invocation_layer/chatgpt.py	16	54.9%
nodes/prompt/invocation_layer/open_ai.py	20	71.91%
nodes/prompt/prompt_node.py	20	45.2%

Totals
Change from base Build 5269594657:	0.05%
Covered Lines:	9563
Relevant Lines:	22600

💛 - Coveralls

ZanSara

There's just one problem I'd like to clarify before approving this

ZanSara · 2023-06-19T14:00:17Z

haystack/document_stores/memory.py

+        torch_import.check()
+


Nitpick: I would always put these check at the very first line of the init. If the deps are not there, no point doing anything else, running any super(), making any checks, etc... It should be the first issue the users are warned about.

ZanSara · 2023-06-19T14:02:42Z

haystack/modeling/infer.py

+from haystack.lazy_imports import LazyImport
+
+with LazyImport() as torch_import:
+    import torch
+    from torch.utils.data.sampler import SequentialSampler
+    from torch.utils.data import Dataset
+    from haystack.modeling.utils import initialize_device_settings, set_all_seeds  # pylint: disable=ungrouped-imports


This should not be needed: this check is done in haystack/modeling/__init__.py, which always run before this module: https://github.com/deepset-ai/haystack/blob/main/haystack/modeling/__init__.py

If that check does not work, let's rather understand why and fix it at that level. Otherwise we would have to try-catch all imports across all modeling modules, which I'd rather avoid 😅

ZanSara · 2023-06-19T14:02:53Z

haystack/modeling/infer.py

@@ -73,6 +77,7 @@ def __init__(
        :return: An instance of the Inferencer.

        """
+        torch_import.check()


Same as above

haystack/nodes/question_generator/question_generator.py

julian-risch · 2023-06-19T14:44:58Z

Thank you for your feedback @ZanSara I addressed the change requests.
Let's discuss the name of the extra. Working title is "inference". I am leaning towards "torch" instead. We install not only torch but also sentencepiece, sentence-transformers and huggingface-hub, yes. However, I believe that "torch" better explains to users when they should install farm-haystack['torch'] and what it does than "inference". transformers has a torch extra too, so some users will already be familiar with it from pip install transformers['torch']. If you have no strong other opinion I would change the extra's name to "torch".

ZanSara

Great, thank you! I'll let you choose the name of the extra, I don't have strong opinions there 👍

julian-risch · 2023-06-19T15:33:18Z

@bilgeyucel and I agreed on the name inference. torch wouldn't really capture the other dependencies we are also installing in the extra. And inference as a name for a bundle of dependencies fits well with one of the other extras we already have called preprocessing.
I will add the warning to the lazyimport now and then we can update the docs and merge the PR.

feat!: add extra for inference dependencies such as torch

01ee103

julian-risch requested a review from a team as a code owner June 14, 2023 17:02

julian-risch requested review from ZanSara and removed request for a team June 14, 2023 17:02

github-actions bot added topic:build/distribution topic:dependencies labels Jun 14, 2023

julian-risch added breaking change topic:installation labels Jun 14, 2023

julian-risch marked this pull request as draft June 16, 2023 13:30

add inference extra to 'all' and 'all-gpu' extra

3e63947

julian-risch added this to the 1.18.0 milestone Jun 19, 2023

install inference extra in selected integration tests

208f993

github-actions bot added topic:CI topic:document_store topic:modeling type:documentation Improvements on the docs labels Jun 19, 2023

import LazyImport

75b012b

julian-risch marked this pull request as ready for review June 19, 2023 13:16

ZanSara suggested changes Jun 19, 2023

View reviewed changes

review feedback

f0be1d2

github-actions bot removed the topic:modeling label Jun 19, 2023

julian-risch requested a review from ZanSara June 19, 2023 14:57

ZanSara approved these changes Jun 19, 2023

View reviewed changes

add import error messages and update readme

032a8a4

julian-risch requested a review from a team as a code owner June 19, 2023 15:57

github-actions bot added topic:audio topic:images labels Jun 19, 2023

github-actions bot added topic:LLM topic:modeling topic:promptnode topic:reader topic:retriever labels Jun 19, 2023

dfokina approved these changes Jun 19, 2023

View reviewed changes

remove extra dot

c84adaa

julian-risch merged commit 30fdf2b into main Jun 20, 2023

julian-risch deleted the torch-extra branch June 20, 2023 07:54

bilgeyucel mentioned this pull request Jun 23, 2023

Nightly tests are not failing when they're supposed to be deepset-ai/haystack-tutorials#212

Closed

anakin87 mentioned this pull request Jun 27, 2023

build: add inference dependency group to docker images #5215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: Add extra for inference dependencies such as torch #5147

feat!: Add extra for inference dependencies such as torch #5147

julian-risch commented Jun 14, 2023 •

edited

Loading

coveralls commented Jun 19, 2023 •

edited

Loading

ZanSara left a comment

ZanSara Jun 19, 2023

ZanSara Jun 19, 2023

ZanSara Jun 19, 2023

julian-risch commented Jun 19, 2023 •

edited

Loading

ZanSara left a comment

julian-risch commented Jun 19, 2023

feat!: Add extra for inference dependencies such as torch #5147

feat!: Add extra for inference dependencies such as torch #5147

Conversation

julian-risch commented Jun 14, 2023 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Alternatives

Checklist

coveralls commented Jun 19, 2023 • edited Loading

Pull Request Test Coverage Report for Build 5314673912

💛 - Coveralls

ZanSara left a comment

Choose a reason for hiding this comment

ZanSara Jun 19, 2023

Choose a reason for hiding this comment

ZanSara Jun 19, 2023

Choose a reason for hiding this comment

ZanSara Jun 19, 2023

Choose a reason for hiding this comment

julian-risch commented Jun 19, 2023 • edited Loading

ZanSara left a comment

Choose a reason for hiding this comment

julian-risch commented Jun 19, 2023

julian-risch commented Jun 14, 2023 •

edited

Loading

coveralls commented Jun 19, 2023 •

edited

Loading

julian-risch commented Jun 19, 2023 •

edited

Loading