Skip to content

Latest commit

 

History

History
217 lines (170 loc) · 8.41 KB

STREAMING.md

File metadata and controls

217 lines (170 loc) · 8.41 KB

Streaming for Bots

NOTE: This feature is in the rollout phase and is available only to specific tenants. Our team is actively working on enabling this feature fully on Teams and across all languages in the SDK. Rest assured; we are diligently working to enable this feature for everyone. Updates will be posted on the Discussions page.

Navigation


AI-powered bots tend to have slower response times which can disengage users. There are two factors that contribute to a slow response. The first is the multiple preprocessing steps such as RAG or function calls which take time and are often required before the LLM can produce a response. The second is the time the LLM takes to generate a full response.

A common solution is to stream the bot’s response to users while the LLM generates its full response. Through streaming, your bot can offer an experience that feels engaging, responsive, and on-par with leading AI products.

There are three parts to streaming:

  • Informative Updates: Provide users with insights into what your bot is doing before it has started generating its response.

  • Response Streaming: Provide users with chunks of the response as they are generated by the LLM. This feels like the bot is actively typing out its message.

  • Tools Streaming: Initiate tool (action) calls as part of the streaming response. Streaming can now be paired with the tools augmentation to enable action calling as part of the streaming experience.

Sample Bots

Streaming Response Class

The StreamingResponse class is the helper class for streaming responses to the client. The class is used to send a series of updates to the client in a single response. If you are using your own custom model, you can directly instantiate and manage this class to stream responses.

The expected sequence of calls is:

  1. queueInformativeUpdate()
  2. queueTextChunk(), ...,
  3. endStream().

Once endStream() is called, the stream is considered ended and no further updates can be sent.

Configuration with Azure Open AI / Open AI

Current Limitations:

  • Streaming is only available in 1:1 chats.
  • SendActivity requests are restricted to 1 RPS. Our SDK buffers to 1.5 seconds.
  • For Powered by AI features, Citations, Sensitivity Label, Feedback Loop and Generated by AI Label are supported in the final chunk.
    • Citations are set per each text chunk queued.
  • Only rich text can be streamed.
  • Due to future GA protocol changes, the channelData metadata must be included in the entities object as well.
  • Only one informative message can be set. This is reused for each message.
    • Examples include:
      • “Scanning through documents”
      • “Summarizing content”
      • “Finding relevant work items”
  • The informative message is rendered only at the beginning of each message returned from the LLM.
  • Attachments can only be sent in the final streamed chunk.
  • Streaming does not work with OpenAI's o1 models.
  • Tools Streaming only works with the tools augmentation. The sequence and monologue augmentations do not currently support streaming.
  • Streaming without tools support works with the default augmentation.

Setup Instructions:

You can configure streaming with your bot by following these steps:

  • Use the DefaultAugmentation class
  • Set stream: true in the OpenAIModel declaration

Optional additions:

  • Set the informative message in the ActionPlanner declaration via the StartStreamingMessage config.
  • As previously, set the feedback loop toggle in the AIOptions object in the app declaration and specify a handler.
    • For Python specifically, the toggle also needs to be set in the ActionPlannerOptions object.
  • Set attachments in the final chunk via the EndStreamHandler in the ActionPlanner declaration.

C#

    // Create OpenAI Model
    builder.Services.AddSingleton<OpenAIModel > (sp => new(
        new OpenAIModelOptions(config.OpenAI.ApiKey, "gpt-4o")
        {
            LogRequests = true,
            Stream = true,              // Set stream toggle
        },
        sp.GetService<ILoggerFactory>()
    ));

ResponseReceivedHandler endStreamHandler = new((object sender, ResponseReceivedEventArgs args) =>
    {
        StreamingResponse? streamer = args.Streamer;

        if (streamer == null)
        {
            return;
        }

        AdaptiveCard adaptiveCard = new("1.6")
        {
            Body = [new AdaptiveTextBlock(streamer.Message) { Wrap = true }]
        };

        var adaptiveCardAttachment = new Attachment()
        {
            ContentType = "application/vnd.microsoft.card.adaptive",
            Content = adaptiveCard,
        };


        streamer.Attachments = [adaptiveCardAttachment];    // Set attachments

    });


    // Create ActionPlanner
    ActionPlanner<TurnState> planner = new(
        options: new(
            model: sp.GetService<OpenAIModel>()!,
            prompts: prompts,
            defaultPrompt: async (context, state, planner) =>
            {
                PromptTemplate template = prompts.GetPrompt("Chat");
                return await Task.FromResult(template);
            }
        )
        {
            LogRepairs = true,
            StartStreamingMessage = "Loading stream results...", // Set informative message
            EndStreamHandler = endStreamHandler // Set final chunk handler
        },
        loggerFactory: loggerFactory
    );

JS/TS

const model = new OpenAIModel({
    // ...Setup OpenAI or AzureOpenAI
    stream: true,                                         // Set stream toggle
});

const endStreamHandler: PromptCompletionModelResponseReceivedEvent = (ctx, memory, response, streamer) => {
    // ... Setup attachments
    streamer.setAttachments([...cards]);                      // Set attachments
};

const planner = new ActionPlanner({
    model,
    prompts,
    defaultPrompt: 'default',
    startStreamingMessage: 'Loading stream results...', // Set informative message
    endStreamHandler: endStreamHandler                  // Set final chunk handler
});

Python

model = OpenAIModel(
        OpenAIModelOptions(api_key=config.OPENAI_KEY, default_model="gpt-4o", stream=True)
    )

def end_stream_handler(
    context: TurnContext,
    state: MemoryBase,
    response: PromptResponse[str],
    streamer: StreamingResponse,
):
    if not streamer:
        return

    card = CardFactory.adaptive_card(
        {
            "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
            "version": "1.6",
            "type": "AdaptiveCard",
            "body": [{"type": "TextBlock", "wrap": True, "text": streamer.message}],
        }
    )

    streamer.set_attachments([card])

planner=ActionPlanner(
                ActionPlannerOptions(
                    model=model,
                    prompts=prompts,
                    default_prompt="tools",
                    enable_feedback_loop=True,                                      # Enable the feedback loop
                    start_streaming_message="Loading streaming results...",         # Set the informative message
                    end_stream_handler=end_stream_handler,                          # Set the final chunk handler
                )
            ),

Return to other major section topics: