Flask streaming OpenAI #9482
Replies: 1 comment
-
🤖 Hello, Based on your code and the context provided, it seems you're trying to stream tokens from the OpenAI model to the client in Flask. Here's a way to do it: First, you need to set up a custom callback handler that will handle each new token. You've already done this with your Next, you need to set up a Flask route that will generate a streaming response. Here's an example of how you can do this: from flask import Response
from langchain.callbacks.manager import CallbackManager
from langchain.llms.openai import OpenAI
from your_module import CustomStreamingCallbackHandler
@app.route('/stream')
def stream():
async def generate():
callback_handler = CustomStreamingCallbackHandler()
callback_manager = CallbackManager([callback_handler])
llm = OpenAI(streaming=True, callback_manager=callback_manager, temperature=0)
message = "Hello, world!"
async for chunk in llm(message):
yield chunk
return Response(generate(), mimetype='text/plain') In this example, a new Please note that this is a simplified example and may need to be adjusted based on your specific use case. For example, you may need to handle errors and retries, and you may want to customize the parameters passed to the For more information, you can refer to the following sources:
I hope this helps! If you have any other questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hi all, I searched a lot online but still no valid final solution.
I need to stream each token but I don't find the right way yet.
So.. all works very fine, and I am able since I override on_llm_new_token to see also in console the streaming.
How can I now send it to the client? I know I can not use yield in the callback, is there any easier way instead of managing async calls?
Thanks a lot for the support
`
class CustomStreamingCallbackHandler(StreamingStdOutCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
sys.stdout.write(token) # Modify the behavior here
sys.stdout.flush()
@flow_call_streaming_api_blueprint.route('/streaming/call')
def stream_chat_gpt():
"""
This streams the response from ChatGPT
"""
prompt = request.args.get('prompt')
projectId = request.args.get('projectId')
sender = request.args.get('ip', default='00000')
def chat_gpt_helper(prompt, projectId, sender):
try:
custom_callback_handler = CustomStreamingCallbackHandler() # Instantiate the callback handler
`
Beta Was this translation helpful? Give feedback.
All reactions