-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: pass model parameters to HFLocalInvocationLayer via model_kwargs
, enabling direct model usage
#4956
Conversation
82dff7a
to
4d493a6
Compare
assert isinstance(layer.pipe.tokenizer, T5TokenizerFast) | ||
|
||
|
||
@pytest.mark.integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be mocked and made unit
4d493a6
to
180c94f
Compare
Pull Request Test Coverage Report for Build 5197464139
💛 - Coveralls |
180c94f
to
f336129
Compare
f336129
to
4b70381
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another pass
8ffcc55
to
f5010af
Compare
f5010af
to
2c72bd3
Compare
Thanks, learned a ton! LMK if there are any additional changes needed. |
model_kwargs
, enabling direct model usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Related Issues
Problem:
As we continue to see advancements in large language model (LLM) development, there has been a notable shift towards custom torch architectures not yet supported in HuggingFace's transformers. This trend poses a challenge for our existing setup, where we rely heavily on the transformers pipeline approach. Take, for example, MPT open-source models. As custom MPT model architecture is not yet part of the Hugging Face transformers and because it includes many custom architectural approaches ranging from FlashAttention, ALiBi, QK LayerNorm, one needs to create a model for MPT "by hand":
To complicate things, some models use tokenizers from other existing models, just like MPT models do.
This custom architecture trend will only likely increase, racing ahead of transformers.
Proposed Changes:
This PR proposes directly integrating HuggingFace models within PromptNode via the model_kwargs model parameter, allowing for more flexibility and compatibility with rapidly evolving LLM architectures.
Assuming someone created the above-mentioned model and tokenizer already, our users would simply create a PromptNode like this:
Key Benefits:
Enhanced Flexibility: this minor change allows us to incorporate the latest LLMs into the Haystack ecosystem, including those based on custom torch architectures, increasing our capability to stay current with LLM advancements.
Future-proof: as LLM research outpaces transformers, this change ensures PromptNode is prepared to adapt quickly to new developments.
How did you test it?
New unit tests, manual tests, and custom demo colab
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.