NOTE: This feature is in the rollout phase and is available only to specific tenants. Our team is actively working on enabling this feature fully on Teams and across all languages in the SDK. Rest assured; we are diligently working to enable this feature for everyone. Updates will be posted on the Discussions page.
Navigation
- 00.OVERVIEW
- Action Planner
- Actions
- AI System
- Application class
- Augmentations
- Data Sources
- Function Calls
- Moderator
- Planner
- Powered by AI
- Prompts
- Streaming
- Turns
- User Authentication
AI-powered bots tend to have slower response times which can disengage users. There are two factors that contribute to a slow response. The first is the multiple preprocessing steps such as RAG or function calls which take time and are often required before the LLM can produce a response. The second is the time the LLM takes to generate a full response.
A common solution is to stream the bot’s response to users while the LLM generates its full response. Through streaming, your bot can offer an experience that feels engaging, responsive, and on-par with leading AI products.
There are three parts to streaming:
-
Informative Updates: Provide users with insights into what your bot is doing before it has started generating its response.
-
Response Streaming: Provide users with chunks of the response as they are generated by the LLM. This feels like the bot is actively typing out its message.
-
Tools Streaming: Initiate tool (action) calls as part of the streaming response. Streaming can now be paired with the
tools
augmentation to enable action calling as part of the streaming experience.
The StreamingResponse
class is the helper class for streaming responses to the client. The class is used to send a series of updates to the client in a single response. If you are using your own custom model, you can directly instantiate and manage this class to stream responses.
The expected sequence of calls is:
queueInformativeUpdate()
queueTextChunk()
, ...,endStream()
.
Once endStream()
is called, the stream is considered ended and no further updates can be sent.
- Streaming is only available in 1:1 chats.
- SendActivity requests are restricted to 1 RPS. Our SDK buffers to 1.5 seconds.
- For Powered by AI features, Citations, Sensitivity Label, Feedback Loop and Generated by AI Label are supported in the final chunk.
- Citations are set per each text chunk queued.
- Only rich text can be streamed.
- Due to future GA protocol changes, the
channelData
metadata must be included in theentities
object as well. - Only one informative message can be set. This is reused for each message.
- Examples include:
- “Scanning through documents”
- “Summarizing content”
- “Finding relevant work items”
- Examples include:
- The informative message is rendered only at the beginning of each message returned from the LLM.
- Attachments can only be sent in the final streamed chunk.
- Streaming does not work with OpenAI's
o1
models. - Tools Streaming only works with the
tools
augmentation. Thesequence
andmonologue
augmentations do not currently support streaming. - Streaming without tools support works with the
default
augmentation.
You can configure streaming with your bot by following these steps:
- Use the
DefaultAugmentation
class - Set
stream: true
in theOpenAIModel
declaration
- Set the informative message in the
ActionPlanner
declaration via theStartStreamingMessage
config. - As previously, set the feedback loop toggle in the
AIOptions
object in theapp
declaration and specify a handler.- For Python specifically, the toggle also needs to be set in the
ActionPlannerOptions
object.
- For Python specifically, the toggle also needs to be set in the
- Set attachments in the final chunk via the
EndStreamHandler
in theActionPlanner
declaration.
// Create OpenAI Model
builder.Services.AddSingleton<OpenAIModel > (sp => new(
new OpenAIModelOptions(config.OpenAI.ApiKey, "gpt-4o")
{
LogRequests = true,
Stream = true, // Set stream toggle
},
sp.GetService<ILoggerFactory>()
));
ResponseReceivedHandler endStreamHandler = new((object sender, ResponseReceivedEventArgs args) =>
{
StreamingResponse? streamer = args.Streamer;
if (streamer == null)
{
return;
}
AdaptiveCard adaptiveCard = new("1.6")
{
Body = [new AdaptiveTextBlock(streamer.Message) { Wrap = true }]
};
var adaptiveCardAttachment = new Attachment()
{
ContentType = "application/vnd.microsoft.card.adaptive",
Content = adaptiveCard,
};
streamer.Attachments = [adaptiveCardAttachment]; // Set attachments
});
// Create ActionPlanner
ActionPlanner<TurnState> planner = new(
options: new(
model: sp.GetService<OpenAIModel>()!,
prompts: prompts,
defaultPrompt: async (context, state, planner) =>
{
PromptTemplate template = prompts.GetPrompt("Chat");
return await Task.FromResult(template);
}
)
{
LogRepairs = true,
StartStreamingMessage = "Loading stream results...", // Set informative message
EndStreamHandler = endStreamHandler // Set final chunk handler
},
loggerFactory: loggerFactory
);
const model = new OpenAIModel({
// ...Setup OpenAI or AzureOpenAI
stream: true, // Set stream toggle
});
const endStreamHandler: PromptCompletionModelResponseReceivedEvent = (ctx, memory, response, streamer) => {
// ... Setup attachments
streamer.setAttachments([...cards]); // Set attachments
};
const planner = new ActionPlanner({
model,
prompts,
defaultPrompt: 'default',
startStreamingMessage: 'Loading stream results...', // Set informative message
endStreamHandler: endStreamHandler // Set final chunk handler
});
model = OpenAIModel(
OpenAIModelOptions(api_key=config.OPENAI_KEY, default_model="gpt-4o", stream=True)
)
def end_stream_handler(
context: TurnContext,
state: MemoryBase,
response: PromptResponse[str],
streamer: StreamingResponse,
):
if not streamer:
return
card = CardFactory.adaptive_card(
{
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"version": "1.6",
"type": "AdaptiveCard",
"body": [{"type": "TextBlock", "wrap": True, "text": streamer.message}],
}
)
streamer.set_attachments([card])
planner=ActionPlanner(
ActionPlannerOptions(
model=model,
prompts=prompts,
default_prompt="tools",
enable_feedback_loop=True, # Enable the feedback loop
start_streaming_message="Loading streaming results...", # Set the informative message
end_stream_handler=end_stream_handler, # Set the final chunk handler
)
),