Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add extra for inference dependencies such as torch #5147

Merged
merged 7 commits into from
Jun 20, 2023

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Jun 14, 2023

Related Issues

Proposed Changes:

  • TLDR: pip install farm-haystack -> pip install farm-haystack[inference]. Users who ran the former in the past now need to run the latter to install the same set of dependencies.
  • This PR adds a new extra, for now called inference. Let's discuss the name of this extra.
  • The inference extra contains torch, sentencepiece extras of transformers so that the versions that are compatible with transformers are installed. It also contains the huggingface hub dependency and sentence-transformers.
  • The standard Haystack installation will not install torch, sentencepiece, sentence-transformers, and huggingface hub but instead just install transformers without those extras. As a result users can run for example PromptNode with different API providers, such as OpenAI or Hugging Face inference endpoints and api inference.
  • Nice side effect: most of the integration tests become several minutes faster because they can skip installing torch.

How did you test it?

  • Manually tested running a PromptNode with OpenAI after installing from this branch. More tests with other model providers pending.

Notes for the reviewer

  • Pillow shouldn't be part of the core dependencies but for now it's required. Otherwise the import fails. Should be addressed in a separate PR.
  • It's a big breaking change for how Haystack is installed, for example in all tutorials, the docs, and the readme. Let's talk with @bilgeyucel and @TuanaCelik about how to roll this out before merging and let's doublecheck that an error message with simple instructions is shown if torch is not installed but required. Should be the case already thanks to feat: optional transformers #5101 .
  • I still need to checked whether it makes more sense to keep huggingface-hub as a core dependency
    • huggingface_hub import is used only used very few times in the code base: https://github.com/search?q=repo%3Adeepset-ai%2Fhaystack%20_hub&type=code One use case is to push a finetuned reader model to the hub and the other use case is to check whether a model that shall be downloaded from the model hub is of type "sentence_transformers". In both cases a torch import is required early so there is no situation when a user reaches the code where huggingface_hub is used before the torch import fails. Therefore, huggingface-hub can be part of the inference extra.
  • Once we have agreed on the name of the extra, I will add a message to all the lazy torch_imports and torch_and_transformers_imports saying message="Run 'pip install farm-haystack[inference]'" done

Alternatives

An alternative would be to also remove plain transformers (without extras) from Haystack's standard dependencies. I think we shouldn't do that. It further reduces the installation to a minimum and thus saves installation time and memory but transformers is required for many things users can do with Haystack and even a PromptNode with a transformers model requires it.

Checklist

@julian-risch julian-risch requested a review from a team as a code owner June 14, 2023 17:02
@julian-risch julian-risch requested review from ZanSara and removed request for a team June 14, 2023 17:02
@julian-risch julian-risch marked this pull request as draft June 16, 2023 13:30
@julian-risch julian-risch added this to the 1.18.0 milestone Jun 19, 2023
@coveralls
Copy link
Collaborator

coveralls commented Jun 19, 2023

Pull Request Test Coverage Report for Build 5314673912

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 384 unchanged lines in 13 files lost coverage.
  • Overall coverage increased (+0.05%) to 42.314%

Files with Coverage Reduction New Missed Lines %
nodes/prompt/prompt_template.py 1 89.55%
agents/types.py 2 93.55%
preview/pipeline.py 3 90.91%
agents/base.py 7 94.38%
utils/openai_utils.py 7 81.32%
nodes/prompt/prompt_model.py 8 84.31%
nodes/answer_generator/openai.py 15 84.62%
nodes/prompt/invocation_layer/chatgpt.py 16 54.9%
nodes/prompt/invocation_layer/open_ai.py 20 71.91%
nodes/prompt/prompt_node.py 20 45.2%
Totals Coverage Status
Change from base Build 5269594657: 0.05%
Covered Lines: 9563
Relevant Lines: 22600

💛 - Coveralls

@julian-risch julian-risch marked this pull request as ready for review June 19, 2023 13:16
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's just one problem I'd like to clarify before approving this

Comment on lines 116 to 117
torch_import.check()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I would always put these check at the very first line of the init. If the deps are not there, no point doing anything else, running any super(), making any checks, etc... It should be the first issue the users are warned about.

Comment on lines 14 to 20
from haystack.lazy_imports import LazyImport

with LazyImport() as torch_import:
import torch
from torch.utils.data.sampler import SequentialSampler
from torch.utils.data import Dataset
from haystack.modeling.utils import initialize_device_settings, set_all_seeds # pylint: disable=ungrouped-imports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed: this check is done in haystack/modeling/__init__.py, which always run before this module: https://github.com/deepset-ai/haystack/blob/main/haystack/modeling/__init__.py

If that check does not work, let's rather understand why and fix it at that level. Otherwise we would have to try-catch all imports across all modeling modules, which I'd rather avoid 😅

@@ -73,6 +77,7 @@ def __init__(
:return: An instance of the Inferencer.

"""
torch_import.check()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@julian-risch
Copy link
Member Author

julian-risch commented Jun 19, 2023

Thank you for your feedback @ZanSara I addressed the change requests.
Let's discuss the name of the extra. Working title is "inference". I am leaning towards "torch" instead. We install not only torch but also sentencepiece, sentence-transformers and huggingface-hub, yes. However, I believe that "torch" better explains to users when they should install farm-haystack['torch'] and what it does than "inference". transformers has a torch extra too, so some users will already be familiar with it from pip install transformers['torch']. If you have no strong other opinion I would change the extra's name to "torch".

@julian-risch julian-risch requested a review from ZanSara June 19, 2023 14:57
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you! I'll let you choose the name of the extra, I don't have strong opinions there 👍

@julian-risch
Copy link
Member Author

@bilgeyucel and I agreed on the name inference. torch wouldn't really capture the other dependencies we are also installing in the extra. And inference as a name for a bundle of dependencies fits well with one of the other extras we already have called preprocessing.
I will add the warning to the lazyimport now and then we can update the docs and merge the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment