-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add the non-threaded and threaded streaming support for pipeline #4889
Conversation
Pull Request Test Coverage Report for Build 5086898268
💛 - Coveralls |
I didn't modify the feedback.py. rest_api/rest_api/controller/feedback.py:116: error: Item "TableCell" of "Union[Span, TableCell]" has no attribute "start" [union-attr] |
f44ec6d
to
d46d854
Compare
Please @vblagoje also help to review together. |
Hi @yuanwu2017 thank you for opening this PR. There is no issue linked but I assume you had a conversation with somebody already, probably @vblagoje ? 🙂 @vblagoje Could you please add some context? Thank you! |
I have not disscussed it with anybody. I created a issue #4890 to clarify it. |
@yuanwu2017 it would be even better to go with both non-threaded and threaded streaming handler options. We'll look into this one a bit closer, and we'll get back to you early next week with ideas on how to proceed forward. |
@yuanwu2017 I had another quick look, and it seems that HF already has a threaded version of the TextStreamer interface here For our users and us, having non-threaded and threaded streaming would make the most sense. Can we work towards that solution? We can follow currently established design patterns to accommodate both approaches without any potential major issues. Let me know your thoughts. |
Thanks for your quick review. You are right. I will integrate the TextIteratorStreamer in HFLocalInvocationLayer for threaded streaming. HFLocalInvocationLayer will start a new thread for pipe() function. Users can get the TextIteratorStreamer generator from output of pipeline.run or promp_node.run. For non-threaded streaming, users need to run prompt node or pipeline in new thread for avoiding block ,when they develop applications. Am I right? |
a59d18b
to
7f6034b
Compare
@vblagoje I updated the PR. Please help to review. Currently the transformers has a bug about using TextIteratorStreamer for text-generation pipeline. I submitted a PR(huggingface/transformers#23641) for it. But for text2text-generation(google/flan-t5-base), it is ok. |
@yuanwu2017 this looks much better now! Note that I'll be on PTO this week. Let's wait for the HF fix to be accepted and we'll move forward with this one. Heads up - we also have a pending PR for hugging face layer, so rebase on top of it once it gets integrated! |
@vblagoje Thanks. |
Hi, @yuanwu2017! I am the technical writer for the team and I noticed some docstrings that could be updated in accordance with our guidelines – would it be okay if I update those and commit directly? But I can also leave comments if you prefer. |
It would be appreciated if you could update those and commit directly. |
Thanks @yuanwu2017 , done :) |
@yuanwu2017 I'm back; do we need to wait for the HF fix? Anything we can do in the meantime? |
The PR(huggingface/transformers#23641) has been merged. I will rebase this PR. |
@yuanwu2017 congrats on the HF PR integration 🚀 We can't integrate this one fully until the HF transformers patch is released, but we can use the transformers main branch to test it out and play with the branch. We seem to have some merge conflicts; LMK if you need assistance. |
* Add the paramters in PromptNode.run() to support running the pipeline in streaming mode. * PromptNode.run() return iterator of TextIteratorStreamer when prompt node runs with stream=True and return_iterator=True. * Add the streaming interface for rest_api Signed-off-by: yuanwu <[email protected]>
Signed-off-by: yuanwu <[email protected]>
* When running pipeline with default prameters, the prompt template is None, the prompt should be the query. * Add promptNode streaming iterator test. * Add then unit test for pipeline streaming mode test. Signed-off-by: yuanwu <[email protected]>
@vblagoje I only added the unit tests for the text2text-generation pipeline, so it should pass the test. The HF PR is for text-generation pipeline. Do you think I need to refine the unit test? |
@vblagoje The get_task exception happend. The default model cannot work now. Maybe it is a new bug in HF? I will try to find out.
|
This usually happens when HF is under temp outage. You can bypass it by providing |
Cannot load the configuration from the HF. Maybe we need to wailt for HF recovery. E OSError: Can't load the configuration of 'google/flan-t5-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'google/flan-t5-base' is the correct path to a directory containing a config.json file |
@vblagoje I found that it can work, if I ran these test source code without pytest. It seems that the pytest disconnects the network during the test. Are there any configurable parameters that allow pytest to pass these tests? |
Hey @yuanwu2017, thanks for your updates. The test fails to run because we don't allow unit tests to open any HTTP connections. As huggingface models check for the task the model supports (we allow only text-generation and text2text-generation) in the I'll now experiment with this branch and provide more detailed feedback. |
@vblagoje Yes, I need your help. I have no idea to test the threaded streaming pipeline with mocking, |
@yuanwu2017, we appreciate your input and recognize the importance of enabling this particular use case. We are actively investigating solutions to enable streaming for all LLMs via REST. This would not only apply to local Hugging Face models but all others as well. We appreciate your patience and understanding :-) |
Hey @yuanwu2017 A more straightforward approach might be the following crude example:
start uvicorn, and then hit the server with the following curl command, for example:
And this will work for all supported types of PromptNode rather than HF only. Please let us know your thoughts about this approach. Note, of course, that this is not a production example. You'll need to handle client disconnects before the generator is exhausted for example etc etc, exception handling can be improved as well, but the gist of the idea is here. |
Yes, Implementing a streamer handler in RESTAPI layer is good choice. I also tried it before. But it encouted a python exception when integrating the prompt node with streamer handler into pipleine. Becuase there is a deepcopy in pipeline implementation. Hope this is a useful hint. If you already have an implemented patch, I can close this one. Or I can continue to finish this implementation according to your suggestion. Please let me know. Thank you very much for your patient review and reply. |
@yuanwu2017 I tried the above approach with several other models, including a local flan t5. It all worked as expected. I think we should close this PR and the relevant issue at this time with the above-specified workaround. In the next iteration of Haystack, we'll include this use case as a built-in option. Let us know your thoughts. |
Ok. I will close it. |
Hello @yuanwu2017, I'm a user of Haystack, and I've been searching for a way to create a chatbot that streams tokens one by one. I'm curious to know if this feature is currently in development and when we can expect it to be released. Thank you! |
@david-hung Please refer to the example |
@yuanwu2017 I am still on an old version of haystack, so I too faced this issue of the deepcopy in the pipeline implementation. You can get around it by adding
to |
Add the dropping prompt feature in HFTokenStreamingHandler
Return a streaming generator when pipeline including streaming mode PromptNode runs.
Add the streaming interface for rest_api
Related Issues
Proposed Changes:
Return a streaming generator when pipeline including streaming mode PromptNode runs.
How did you test it?
pytest prompt
===================================================================== 138 passed, 48 skipped, 5 warnings in 103.81s (0:01:43) =====================================================================
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.