Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: SK Multiagents - working for ChatCompletionAgents but not for AzureAssistantAgent #10084

Closed
zzulueta opened this issue Jan 6, 2025 · 19 comments
Assignees
Labels
agents python Pull requests for the Python Semantic Kernel

Comments

@zzulueta
Copy link

zzulueta commented Jan 6, 2025

I have a multi-agent system that works with the ChatCompletionAgent but does not work for AzureAssistantAgent as shown below. If I switch my agents to an AzureAssistantAgent version, the system does not want to handoff to the next agent. Any idea why?

###ChatCompletionAgent WORKS###
### Manager Assistant
manager_kernel = _create_kernel_with_chat_completion(MANAGER_NAME)
manager_agent = ChatCompletionAgent(
    service_id=MANAGER_NAME, 
    kernel=manager_kernel, 
    name=MANAGER_NAME, 
    instructions=f"""
        You count the number of conversations between {DALLE_NAME} and {VISION_NAME}.
        An image generated by {DALLE_NAME} and an analysis by {VISION_NAME} is considered as a single conversation.
        Once four (4) conversations are completed, you will provide a termination message: {TERMINATION_KEYWORD}
        """
)

### DALLE Assistant
dalle_assistant_kernel = _create_kernel_with_chat_completion(DALLE_NAME)
dalle_assistant_kernel.add_plugin(GenerateImagePlugin(), plugin_name="GenerateImagePlugin")

settings = dalle_assistant_kernel.get_prompt_execution_settings_from_service_id(service_id=DALLE_NAME)
settings.function_choice_behavior = FunctionChoiceBehavior.Required()

dalle_assistant_agent = ChatCompletionAgent(
    service_id=DALLE_NAME, 
    kernel=dalle_assistant_kernel, 
    name=DALLE_NAME, 
    instructions="""
        As a premier AI specializing in image generation, you possess the expertise to craft precise visuals based on given prompts. 
        It is essential that you diligently generate the requested image, ensuring its accuracy and alignment with the user's specifications, 
        prior to delivering a response.
        You will have access to the local file system to store the generated image.
        You will generate an image based on the user's prompt and display it for review.
        You will generate new images based on the feedback from the Vision Assistant.
        """, 
    execution_settings=settings
)

### Vision Assistant
vision_assistant_kernel = _create_kernel_with_chat_completion(VISION_NAME)
vision_assistant_kernel.add_plugin(AnalyzeImagePlugin(), plugin_name="AnalyzeImagePlugin")

settings = vision_assistant_kernel.get_prompt_execution_settings_from_service_id(service_id=VISION_NAME)
settings.function_choice_behavior = FunctionChoiceBehavior.Required()

vision_assistant_agent = ChatCompletionAgent(
    service_id=VISION_NAME, 
    kernel=vision_assistant_kernel, 
    name=VISION_NAME, 
    instructions=""" 
        As a leading AI expert in image analysis, you excel at scrutinizing and offering critiques to refine and improve images. 
        Your task is to thoroughly analyze an image, ensuring that all essential assessments are completed with precision 
        before you provide feedback to the user. You have access to the local file system where the image is stored.
        You will analyze the image and provide a new prompt for Dall-e that enhances the image based on the criticism and analysis.
        You will then instruct the Dall-e Assistant to generate a new image based on the new prompt.
        """, 
    execution_settings=settings
)

selection_function = KernelFunctionFromPrompt(
    function_name="selection",
    prompt=f"""
        Determine which participant takes the next turn in a conversation based on the most recent participant.
        State only the name of the participant to take the next turn.
        Choose only from these participants:
        - {DALLE_NAME}
        - {VISION_NAME}
        - {MANAGER_NAME}
        
        You will follow this sequence:
        {DALLE_NAME} will generate an image based on the initial user prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        
        {MANAGER_NAME} will count the number of conversations between {DALLE_NAME} and {VISION_NAME}.
        
        No participant should take more than one turn in a row.

        History:
        {{{{$history}}}}
        """,
)

termination_function = KernelFunctionFromPrompt(
    function_name="termination",
    prompt=f"""
        Determine if the conversation should be terminated based on the number of conversations by the {MANAGER_NAME}.
        If number of conversations is reached, respond with the termination keyword: {TERMINATION_KEYWORD}
        RESPONSE:
        {{{{$history}}}}"""
)

chat = AgentGroupChat(
    agents=[dalle_assistant_agent, vision_assistant_agent, manager_agent],
    selection_strategy=KernelFunctionSelectionStrategy(
        function=selection_function,
        kernel=_create_kernel_with_chat_completion("selection"),
        result_parser=lambda result: str(result.value[0]) if result.value is not None else DALLE_NAME or VISION_NAME,
        agent_variable_name="agents",
        history_variable_name="history"
    ),
    termination_strategy=KernelFunctionTerminationStrategy(
        agents=[manager_agent],
        function=termination_function,
        kernel=_create_kernel_with_chat_completion("termination"),
        result_parser=lambda result: TERMINATION_KEYWORD in str(result.value[0]).lower(),
        history_variable_name="history",
        maximum_iterations=10,
    ),
)

###AzureAssistantAgent Does not work###
### Manager Assistant
manager_kernel = _create_kernel_with_chat_completion(MANAGER_NAME)

manager_agent = await AzureAssistantAgent.create(
    service_id=MANAGER_NAME, 
    kernel=manager_kernel, 
    name=MANAGER_NAME, 
    instructions=f"""
        An image generated by {DALLE_NAME} and an analysis by {VISION_NAME} is considered as a single conversation.
        You will monitor the conversation between the {DALLE_NAME} and the {VISION_NAME} and count the number of conversations.
        Once four (4) conversations are completed, you will provide a termination message: {TERMINATION_KEYWORD}
        You will not provide any other input or output to the conversation other than the termination message.
        """
)

### DALLE Assistant
dalle_assistant_kernel = _create_kernel_with_chat_completion(DALLE_NAME)
dalle_assistant_kernel.add_plugin(GenerateImagePlugin(), plugin_name="GenerateImagePlugin")

settings = dalle_assistant_kernel.get_prompt_execution_settings_from_service_id(service_id=DALLE_NAME)
settings.function_choice_behavior = FunctionChoiceBehavior.Required()

dalle_assistant_agent = await AzureAssistantAgent.create(
    service_id=DALLE_NAME, 
    kernel=dalle_assistant_kernel, 
    name=DALLE_NAME, 
    instructions="""
        As a premier AI specializing in image generation, you possess the expertise to craft precise visuals based on given prompts. 
        It is essential that you diligently generate the requested image, ensuring its accuracy and alignment with the user's specifications, 
        prior to delivering a response.
        You will have access to the local file system to store the generated image.
        You will generate an image based on the user's prompt and display it for review.
        You will generate new images based on the feedback from the Vision Assistant.
        """, 
    execution_settings=settings
)

### Vision Assistant
vision_assistant_kernel = _create_kernel_with_chat_completion(VISION_NAME)
vision_assistant_kernel.add_plugin(AnalyzeImagePlugin(), plugin_name="AnalyzeImagePlugin")

settings = vision_assistant_kernel.get_prompt_execution_settings_from_service_id(service_id=VISION_NAME)
settings.function_choice_behavior = FunctionChoiceBehavior.Required()

vision_assistant_agent = await AzureAssistantAgent.create(
    service_id=VISION_NAME, 
    kernel=vision_assistant_kernel, 
    name=VISION_NAME, 
    instructions=""" 
        As a leading AI expert in image analysis, you excel at scrutinizing and offering critiques to refine and improve images. 
        Your task is to thoroughly analyze an image, ensuring that all essential assessments are completed with precision 
        before you provide feedback to the user. You have access to the local file system where the image is stored.
        You will analyze the image and provide a new prompt for Dall-e that enhances the image based on the criticism and analysis.
        You will then instruct the Dall-e Assistant to generate a new image based on the new prompt.
        """, 
    execution_settings=settings
)

selection_function = KernelFunctionFromPrompt(
    function_name="selection",
    prompt=f"""
        Determine which participant takes the next turn in a conversation based on the most recent participant.
        State only the name of the participant to take the next turn.
        Choose only from these participants:
        - {DALLE_NAME}
        - {VISION_NAME}
        - {MANAGER_NAME}

        You will follow this sequence:
        {DALLE_NAME} will generate an image based on the initial user prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
        {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
        {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
        {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
        {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
        
        No participant should take more than one turn in a row.

        History:
        {{{{$history}}}}
        """,
)

termination_function = KernelFunctionFromPrompt(
    function_name="termination",
    prompt=f"""
        Determine if the conversation should be terminated based on the number of iterations.
        If number of iterations is reached, respond with the termination keyword: {TERMINATION_KEYWORD}
        RESPONSE:
        {{{{$history}}}}"""
)

chat = AgentGroupChat(
    agents=[dalle_assistant_agent, vision_assistant_agent, manager_agent],
    selection_strategy=KernelFunctionSelectionStrategy(
        function=selection_function,
        kernel=_create_kernel_with_chat_completion("selection"),
        result_parser=lambda result: str(result.value[0]) if result.value is not None else DALLE_NAME or VISION_NAME,
        agent_variable_name="agents",
        history_variable_name="history"
    ),
    termination_strategy=KernelFunctionTerminationStrategy(
        agents=[manager_agent],
        function=termination_function,
        kernel=_create_kernel_with_chat_completion("termination"),
        result_parser=lambda result: TERMINATION_KEYWORD in str(result.value[0]).lower(),
        history_variable_name="history",
        maximum_iterations=10,
    ),
)
@zzulueta zzulueta changed the title SK Multiagents SK Multiagents - working for ChatCompletionAgents but not for AzureAssistantAgent Jan 6, 2025
@moonbox3
Copy link
Contributor

moonbox3 commented Jan 7, 2025

Hello @zzulueta could you give us some more information around:

If I switch my agents to an AzureAssistantAgent version, the system does not want to handoff to the next agent. Any idea why?

Is there a stack trace? An error? Which Azure OpenAI api version are you using? I need to look more closely through your code, but my initial hunch is that there is a difference with how Azure is handling the Required function choice behavior, versus OpenAI.

Can you turn on some warning or error level debugging in your script, please, if you aren't seeing any errors propagated right now? You can add:

import logging

logging.basicConfig(level=logging.DEBUG)

@moonbox3 moonbox3 self-assigned this Jan 7, 2025
@moonbox3 moonbox3 added python Pull requests for the Python Semantic Kernel agents and removed triage labels Jan 7, 2025
@github-actions github-actions bot changed the title SK Multiagents - working for ChatCompletionAgents but not for AzureAssistantAgent Python: SK Multiagents - working for ChatCompletionAgents but not for AzureAssistantAgent Jan 7, 2025
@zzulueta
Copy link
Author

zzulueta commented Jan 7, 2025

This is what came out:
INFO:semantic_kernel.agents.strategies.termination.termination_strategy:Evaluating termination criteria for c4211d81-885f-456a-bba5-ccfaeea5296e
INFO:semantic_kernel.agents.strategies.termination.termination_strategy:Agent c4211d81-885f-456a-bba5-ccfaeea5296e is out of scope
ERROR:semantic_kernel.agents.group_chat.agent_group_chat:Failed to select agent:

My Azure OpenAI api version is 2024-10-01-preview.

My code for ChatCompletionAgents and AzureAssistantAgent identical. The only difference is how the Agents were individually created. Two ChatCompletionAgents work. But when one is converted into an AzureAssistantAgent upon creation, it just fails.

@moonbox3
Copy link
Contributor

moonbox3 commented Jan 7, 2025

Thanks for the additional context, @zzulueta. Let me spend some time on this, and I will get back to you ASAP.

@moonbox3
Copy link
Contributor

moonbox3 commented Jan 8, 2025

I haven't had a chance to dig into this more. But looking at the code, we should get more of an exception or a stack trace if we fail to select an agent:

try:
    selected_agent = await self.selection_strategy.next(self.agents, self.history.messages)
except Exception as ex:
    logger.error(f"Failed to select agent: {ex}")
    raise AgentChatException("Failed to select agent") from ex

@zzulueta
Copy link
Author

zzulueta commented Jan 8, 2025

Sorry, where should I put this new code?

@moonbox3
Copy link
Contributor

moonbox3 commented Jan 8, 2025

This is the current code -- I was pointing out that we should get more of an error and we even chain the exception to the AgentChatException.

@moonbox3
Copy link
Contributor

@zzulueta any chance you can give me access to your image generator plugin (even private via GitHub)? I want to reproduce your issue and want to make sure I am looking at the same code as you. Thanks.

@moonbox3
Copy link
Contributor

moonbox3 commented Jan 13, 2025

@zzulueta in any case, I create some dummy plugins just to see how the selection strategy was. I'm not sure if when you are running the Azure Assistants code, you are running from one script right now, and if any code is commented while doing that, but the one thing that is important, and not shown in your code is how you create the kernel to add to the assistant. Are you making sure to add an AzureChatCompletion service to the kernel, even for the AzureAssistantAgent case? This is necessary because you have the KernelFunctionSelectionStrategy and KernelFunctionTerminationStrategy configured. You're making a call to the model to have it return its agent selection or termination response. Could this be missing from your code?

It's something like this:

def _create_kernel_with_chat_completion(service_id: str) -> Kernel:
    kernel = Kernel()
    kernel.add_service(AzureChatCompletion(service_id=service_id)) # without this line, the selection or termination function cannot run
    return kernel

Here's my AzureAssistantAgent code with some dummy functions that works to select each agent:

import asyncio
import os
from typing import Annotated

from semantic_kernel.agents import AgentGroupChat
from semantic_kernel.agents.open_ai import AzureAssistantAgent
from semantic_kernel.agents.strategies import (
    KernelFunctionSelectionStrategy,
    KernelFunctionTerminationStrategy,
)
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import AuthorRole, ChatMessageContent
from semantic_kernel.functions import KernelFunctionFromPrompt, kernel_function
from semantic_kernel.kernel import Kernel

MANAGER_NAME = "manager"
DALLE_NAME = "dalle"
VISION_NAME = "vision"
TERMINATION_KEYWORD = "END"


def _create_kernel_with_chat_completion(service_id: str) -> Kernel:
    kernel = Kernel()
    kernel.add_service(AzureChatCompletion(service_id=service_id))
    return kernel


class GenerateImagePlugin:
    @kernel_function(description="Generates an image based on the user's prompt.")
    def generate_image(
        self, prompt: Annotated[str, "The prompt for the image generation."]
    ) -> Annotated[str, "Returns the generated image."]:
        return "An image of a cat."


class AnalyzeImagePlugin:
    @kernel_function(description="Analyzes an image and provides feedback.")
    def analyze_image(
        self, image: Annotated[str, "The image to analyze."]
    ) -> Annotated[str, "Returns the analysis of the image."]:
        return "The image is of a cat."


# AzureAssistantAgent - works in this code###
# Manager Assistant
async def main():
    manager_kernel = _create_kernel_with_chat_completion(MANAGER_NAME)

    manager_agent = await AzureAssistantAgent.create(
        service_id=MANAGER_NAME,
        kernel=manager_kernel,
        name=MANAGER_NAME,
        instructions=f"""
            An image generated by {DALLE_NAME} and an analysis by {VISION_NAME} is considered as a single conversation.
            You will monitor the conversation between the {DALLE_NAME} and the {VISION_NAME} and count the number of conversations.
            Once four (4) conversations are completed, you will provide a termination message: {TERMINATION_KEYWORD}
            You will not provide any other input or output to the conversation other than the termination message.
            """,
    )

    # DALLE Assistant
    dalle_assistant_kernel = _create_kernel_with_chat_completion(DALLE_NAME)
    dalle_assistant_kernel.add_plugin(GenerateImagePlugin(), plugin_name="GenerateImagePlugin")

    dalle_assistant_agent = await AzureAssistantAgent.create(
        service_id=DALLE_NAME,
        kernel=dalle_assistant_kernel,
        name=DALLE_NAME,
        instructions="""
            As a premier AI specializing in image generation, you possess the expertise to craft precise visuals based on given prompts. 
            It is essential that you diligently generate the requested image, ensuring its accuracy and alignment with the user's specifications, 
            prior to delivering a response.
            You will have access to the local file system to store the generated image.
            You will generate an image based on the user's prompt and display it for review.
            You will generate new images based on the feedback from the Vision Assistant.
            """,
        # execution_settings=settings,
    )

    # Vision Assistant
    vision_assistant_kernel = _create_kernel_with_chat_completion(VISION_NAME)
    vision_assistant_kernel.add_plugin(AnalyzeImagePlugin(), plugin_name="AnalyzeImagePlugin")

    vision_assistant_agent = await AzureAssistantAgent.create(
        service_id=VISION_NAME,
        kernel=vision_assistant_kernel,
        name=VISION_NAME,
        instructions=""" 
            As a leading AI expert in image analysis, you excel at scrutinizing and offering critiques to refine and improve images. 
            Your task is to thoroughly analyze an image, ensuring that all essential assessments are completed with precision 
            before you provide feedback to the user. You have access to the local file system where the image is stored.
            You will analyze the image and provide a new prompt for Dall-e that enhances the image based on the criticism and analysis.
            You will then instruct the Dall-e Assistant to generate a new image based on the new prompt.
            """,
        # execution_settings=settings,
    )

    selection_function = KernelFunctionFromPrompt(
        function_name="selection",
        prompt=f"""
            Determine which participant takes the next turn in a conversation based on the most recent participant.
            State only the name of the participant to take the next turn.
            Choose only from these participants:
            - {DALLE_NAME}
            - {VISION_NAME}
            - {MANAGER_NAME}

            You will follow this sequence:
            {DALLE_NAME} will generate an image based on the initial user prompt and display it for review.
            {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
            {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
            {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
            {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
            {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
            {DALLE_NAME} will generate an image based on the {VISION_NAME} prompt and display it for review.
            {VISION_NAME} will analyze the image and provide a new prompt for {DALLE_NAME} to generate a new image based on the new prompt.
            {MANAGER_NAME} will monitor the conversation between {DALLE_NAME} and {VISION_NAME} and count the number of conversations.
            
            No participant should take more than one turn in a row.

            History:
            {{{{$history}}}}
            """,
    )

    termination_function = KernelFunctionFromPrompt(
        function_name="termination",
        prompt=f"""
            Determine if the conversation should be terminated based on the number of iterations.
            If number of iterations is reached, respond with the termination keyword: {TERMINATION_KEYWORD}
            RESPONSE:
            {{{{$history}}}}""",
    )

    chat = AgentGroupChat(
        agents=[dalle_assistant_agent, vision_assistant_agent, manager_agent],
        selection_strategy=KernelFunctionSelectionStrategy(
            function=selection_function,
            kernel=_create_kernel_with_chat_completion("selection"),
            result_parser=lambda result: str(result.value[0])
            if result.value is not None
            else DALLE_NAME or VISION_NAME,
            agent_variable_name="agents",
            history_variable_name="history",
        ),
        termination_strategy=KernelFunctionTerminationStrategy(
            agents=[manager_agent],
            function=termination_function,
            kernel=_create_kernel_with_chat_completion("termination"),
            result_parser=lambda result: TERMINATION_KEYWORD in str(result.value[0]).lower(),
            history_variable_name="history",
            maximum_iterations=10,
        ),
    )

    is_complete: bool = False
    while not is_complete:
        user_input = input("User:> ")
        if not user_input:
            continue

        if user_input.lower() == "exit":
            is_complete = True
            break

        if user_input.lower() == "reset":
            await chat.reset()
            print("[Conversation has been reset]")
            continue

        await chat.add_chat_message(ChatMessageContent(role=AuthorRole.USER, content=user_input))

        try:
            async for response in chat.invoke():
                print(os.linesep)
                print(f"# {response.role} - {response.name or '*'}: '{response.content}'")

            if chat.is_complete:
                is_complete = True
                break
        except Exception as e:
            print(f"Error: {e}")
            break
        finally:
            await manager_agent.delete()
            await dalle_assistant_agent.delete()
            await vision_assistant_agent.delete()


if __name__ == "__main__":
    asyncio.run(main())

with the following sample output:

User:> create a photo of a cat


# AuthorRole.ASSISTANT - dalle: 'Here is the image of a cute, fluffy cat sitting on a windowsill with sunlight streaming in, surrounded by green plants. Please take a look! 

![Cat Image](attachment://cat_image.png)'


# AuthorRole.ASSISTANT - vision: 'The analysis confirmed that the image is of a cat. However, to enhance its appeal, I will provide specific feedback for improvement:

1. **Lighting**: The lighting could be more dramatic to highlight the features of the cat, perhaps by emphasizing the warm glow of the sunlight.
2. **Background**: The background could be less cluttered or more harmonious to draw attention to the cat, possibly featuring a softer blur.
3. **Pose**: Capturing the cat in a more dynamic pose could add interest to the image.
4. **Focus**: Ensure that the cat's eyes are sharply in focus to create a captivating focal point.
5. **Color balance**: Some adjustments to the color balance could enhance the vibrancy of the image.

Based on this feedback, here is a new prompt for DALL-E:

"Create an image of a fluffy cat perched gracefully on a sunlit windowsill, surrounded by soft, green plants. The cat's fur should glow in the warm sunlight, while the background is artistically blurred to keep the focus on the cat's piercing eyes and playful pose."

I will now instruct the DALL-E Assistant to generate a new image based on this prompt.'


# AuthorRole.ASSISTANT - vision: 'It seems I made an error while attempting to generate the new image with DALL-E. Instead, let me clearly state the new prompt for generating the cat image:

"Create an image of a fluffy cat perched gracefully on a sunlit windowsill, surrounded by soft, green plants. The cat's fur should glow in the warm sunlight, while the background is artistically blurred to keep the focus on the cat's piercing eyes and playful pose."

Unfortunately, I cannot proceed with the image generation directly from here. Please use this enhanced prompt with your DALL-E interface to create the updated image. If you need further assistance, let me know!'


# AuthorRole.ASSISTANT - manager: 'The analysis confirmed that the image is of a cat. Would you like more details or a different type of image? Let's create another prompt! 

"Generate an image of a playful kitten exploring a lush, colorful garden filled with flowers and butterflies. Capture the curiosity in the kitten's eyes and the vibrant colors of the flora surrounding it."

I'm looking forward to your thoughts!'


# AuthorRole.ASSISTANT - manager: 'An image of a cat.'


# AuthorRole.ASSISTANT - manager: 'The tool call arguments are malformed. Arguments must be in JSON format. Please try again.'


# AuthorRole.ASSISTANT - dalle: 'It seems I made an error while attempting to generate the new image with DALL-E. Instead, let me clearly state the new prompt for generating the cat image:

"Create an image of a fluffy cat perched gracefully on a sunlit windowsill, surrounded by soft, green plants. The cat's fur should glow in the warm sunlight, while the background is artistically blurred to keep the focus on the cat's piercing eyes and playful pose."

I will now attempt to generate a new image based on this improved prompt.'


# AuthorRole.ASSISTANT - dalle: 'Here is the updated image of a fluffy cat perched gracefully on a sunlit windowsill, surrounded by soft, green plants. Please take a look! 

![Updated Cat Image](attachment://updated_cat_image.png)'


# AuthorRole.ASSISTANT - vision: 'The analysis confirmed that the new image is of a cat. Please let me know if there's anything specific you would like to adjust or any other images you'd like to create!'


# AuthorRole.ASSISTANT - manager: 'Here is another image of a cat, this time featuring a playful scene where the cat is chasing a butterfly in a vibrant garden setting. Please take a look! 

![Another Cat Image](attachment://cat_playful_image.png)'


# AuthorRole.ASSISTANT - manager: 'An image of a cat.'


# AuthorRole.ASSISTANT - manager: 'The image has been successfully identified as a cat. If you would like to see more variations or different themes, just let me know!'


# AuthorRole.ASSISTANT - manager: 'END'


# AuthorRole.ASSISTANT - dalle: 'Here is another image of a cat, this time featuring a playful scene where the cat is chasing a butterfly in a vibrant garden setting. Please take a look! 

![Another Cat Image](attachment://cat_playful_image.png)'


# AuthorRole.ASSISTANT - vision: 'The image has been successfully identified as a cat. If you would like to see more variations or different themes, just let me know!'


# AuthorRole.ASSISTANT - manager: 'END'


# AuthorRole.ASSISTANT - dalle: 'If you have any specific requests for images or further modifications you would like, please let me know! I'm here to help.'

@zzulueta
Copy link
Author

@moonbox3 yes I assign an AzureChatCompletion service. I actually used your code there from other examples.

from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import AzureChatCompletion
from semantic_kernel.kernel import Kernel

def _create_kernel_with_chat_completion(service_id: str) -> Kernel:
kernel = Kernel()
kernel.add_service(AzureChatCompletion(service_id=service_id))
return kernel

@zzulueta
Copy link
Author

zzulueta commented Jan 13, 2025

@moonbox3 i added you as a collaborator to the whole notebook

There are two notebooks:
Agents are using ChatCompletionAgent - WORKS
Agents are using AzureAssistantAgent - Failed to select agent

@moonbox3
Copy link
Contributor

@zzulueta Thanks for adding me. I see there is a stack trace in your notebook -- I hadn't seen that before. The following bug was just recently fixed:

File ~/.python/current/lib/python3.12/site-packages/semantic_kernel/contents/chat_message_content.py:298, in ChatMessageContent.to_dict(self, role_key, content_key)
    297 if self.role == AuthorRole.TOOL:
--> 298     assert isinstance(self.items[0], FunctionResultContent)  # nosec
    299     ret["tool_call_id"] = self.items[0].id or ""

Can you please try to upgrade your SK Python version to 1.18.1?

I was able to run your code without issues, and the agent selection function is working without error.

A couple of unsolicited thoughts for you, too:

  • We have a text to image connector for AzureOpenAI. Here is a sample: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/concepts/images/image_generation.py. It is using OpenAI, but you would just need to change that to AzureTextToImage. It works similarly to AzureChatCompletion.
  • All of SK is async based -- I would recommend updating your AzureOpenAI client to AsyncAzureOpenAI. That's optional, of course, but running asynchronous kernel functions is generally a good idea.
  • You may find some trouble assigning the task of an LLM agent to count the number of conversations. It may be better to have a kernel plugin keep track of some state (iteration count?) and then decide how to operate.

@moonbox3
Copy link
Contributor

If you continue to hit an issue after upgrading to 1.18.1, please re-open the issue. Otherwise, with your code, I've been able to successfully run it all and it's working on the latest SK Python code.

@zzulueta
Copy link
Author

@moonbox3 Thanks for the inputs.

In the ChatCompletionAgent scenario, I added a tool for the Manager to count the number of images created. In addition, I modified everything to the AsyncAzureOpenAI. It works perfectly now even the image count.

However, the AzureAssistantAgent scenario still does not work properly. The system was able to hand off from the Dalle Assistant to the Vision assistant, but there is a problem bringing the conversation back to the Dalle Assistant to create a new image.

@moonbox3
Copy link
Contributor

The system was able to hand off from the Dalle Assistant to the Vision assistant, but there is a problem bringing the conversation back to the Dalle Assistant to create a new image.

Can you please elaborate on this? What do you mean by "there is a problem?" An error? A stack trace? The model doesn't hand the conversation back to the Dalle assistant?

@zzulueta
Copy link
Author

You can try running the system. Here is my output below. As you can see, the Assistant should be selecting the Dalle Assistant with the new prompt. Unfortunately, the Vision assistant keeps on being called.

Dalle Assistant Message: /workspaces/AzureAI/SemanticKernelAgents/temp.jpg
# AuthorRole.ASSISTANT - DalleAssistant: 'I have generated the image based on your prompt. Here it is:

Boat drifting in the water with a sunset in the background

Please review the image and let me know if there are any adjustments or additional details you'd like to include.'
Vision Assistant Message: Analyzing the image...
Vision Assistant Message: ### Analysis
The image conveys a serene and peaceful evening scene with a small boat floating gently on a calm body of water. The sunset creates a vivid orange and red backdrop, highlighting the silhouette of the boat against the colorful sky. The reflection in the water enhances the sense of tranquility and depth.

Critique

While the image is visually striking, the colors might appear slightly oversaturated, potentially giving it an unreal aesthetic that might detract from the natural beauty of the scene. The central composition of the boat and the sun directly above may seem a bit too symmetric, which could be perceived as less dynamic.

Enhanced Prompt for DALL-E

"Create an image of an old wooden rowboat gently floating on a crystal clear mountain lake during sunset. The sky is a blend of soft pink, purple, and orange tones, reflecting subtly on the rippling water surface. Surrounding the lake are gentle slopes covered with wildflowers and sparse trees. The scene radiates a peaceful, almost ethereal quality, capturing the essence of a tranquil evening without oversaturation, incorporating a more dynamic asymmetrical composition that leads the viewer’s eye through the scene."
Vision Assistant Message: Analyzing the image...
Vision Assistant Message: ### Analysis and Critique
This image beautifully captures a tranquil sunset scene with a solitary boat floating on calm waters. The color palette is very striking and conveys a sense of peace and serenity through its warm reds and oranges contrasted with the dark blues and purples. The composition is well-balanced with the placement of the boat towards the left, creating an effective use of negative space that allows the viewer's eye to wander and explore the expanse of the water and the sunset.

The reflections in the water are also praisely executed, enhancing the overall calmness and depth of the scene. The sun, placed strategically on the horizon, serves as a focal point and balances the image by its positioning directly opposite the boat.

However, the image could be criticized for perhaps being a bit too idealized in its portrayal of nature. The saturation and contrast might be seen as excessive, making the scene feel more like a painting than a photograph. This might detach the viewer from feeling a connection with what could be a real-world setting.

Enhanced Prompt for DALL-E

"With the above critiques in mind, generate an image of a solitary wooden boat on a smooth lake at sunset. Adjust the color palette to include softer and more natural shades of red, orange, and purple blending harmoniously into the darkening sky. The sun should be depicted just touching the horizon, softening its glare and enhancing the reflection on the water to depict a real-world scenario. Include subtle ripples in the lake to add a touch of realism, avoiding an overly smooth texture on the water surface. The composition should maintain a balanced distribution of elements with the boat tastefully offset rather than centered."
# AuthorRole.ASSISTANT - VisionAssistant: 'Based on the analysis and critique, I have crafted a new prompt to enhance the image:

"Create an image of an old wooden rowboat gently floating on a crystal clear mountain lake during sunset. The sky is a blend of soft pink, purple, and orange tones, reflecting subtly on the rippling water surface. Surrounding the lake are gentle slopes covered with wildflowers and sparse trees. The scene radiates a peaceful, almost ethereal quality, capturing the essence of a tranquil evening without oversaturation, incorporating a more dynamic asymmetrical composition that leads the viewer’s eye through the scene."

I will now instruct the Dall-e Assistant to generate a new image based on this enhanced prompt.'
Vision Assistant Message: Analyzing the image...
Vision Assistant Message: Image Analysis and Criticism:

This image depicts a beautifully serene scene with a solitary boat floating on what appears to be a smooth body of water. The setting sun forms a perfect circle on the horizon, casting a vibrant mix of red and orange hues across the sky and water. The reflections in the water are particularly striking, mirroring the intense colors of the sky and hinting at a slight ripple that adds texture to the scene.

Criticism:

  1. Color Saturation: The image may be considered overly saturated for some tastes. The intense reds and oranges, though visually captivating, could overshadow the subtler details and textures of the scene, making it feel slightly unreal.
  2. Composition: The centered positioning of the boat and the sun is aesthetically pleasing. However, experimenting with the rule of thirds by placing these elements slightly off-center might create a different dynamic and potentially enhance the visual interest in the image.
  3. Focus and Clarity: The boat and its reflection are crisply focused, which is excellent. However, enhancing the focus or adding more detail to the distant landscape might provide a greater sense of depth.

Enhanced Prompt for DALL-E Based on Criticism:

"Create an image of a solitary boat floating on a calm lake at sunset with a slightly less saturated and more natural palette of reds, oranges, and purples. The sun should be placed at the intersection according to the rule of thirds for dynamic composition. Include a clear and detailed distant mountain range under a sky with realistic cloud formations, adding depth and contrast to the serene water that reflects a more subtly vibrant sunset."

AuthorRole.ASSISTANT - VisionAssistant: 'Based on the analysis and critique, I have crafted a further refined prompt to enhance the image:

"Generate an image of a solitary wooden boat on a smooth lake at sunset. Adjust the color palette to include softer and more natural shades of red, orange, and purple blending harmoniously into the darkening sky. The sun should be depicted just touching the horizon, softening its glare and enhancing the reflection on the water to depict a real-world scenario. Include subtle ripples in the lake to add a touch of realism, avoiding an overly smooth texture on the water surface. The composition should maintain a balanced distribution of elements with the boat tastefully offset rather than centered."

I will now instruct the Dall-e Assistant to generate a new image based on this refined prompt.'
Vision Assistant Message: Analyzing the image...
Vision Assistant Message: ### Analysis and Critique

The image depicts a serene sunset scene featuring a lone boat gently floating on a mirror-like body of water, under a deep red sky. The overall composition and the use of vibrant colors are visually striking, creating a sense of calm and isolation.

Strengths:

  1. Color Palette: The vibrant shades of red and orange create a warm, inviting atmosphere. The gradient in the sky from dark to light adds depth to the image.
  2. Composition: The placement of the boat in the foreground, with the sun setting directly in the background, establishes a strong focal point and leads the viewer’s eye through the image.
  3. Reflections in the Water: The reflection of the boat and the sun adds symmetry and balance to the composition, enhancing the tranquil mood.

Areas for Improvement:

  1. Detail and Texture: The boat and the distant landscape could benefit from more detailed texturing to give a more realistic feel to the scene.
  2. Lighting and Shadows: While the lighting accentuates the sunset, additional subtle shading on the boat and in the water could provide a more three-dimensional feel and enhance the realism.

Enhanced Prompt for DALL-E

"Create an image of a small boat floating on a glass-like lake at sunset. The sky should be a gradient of deep red to orange, reflecting vividly on the water. Add more texture to the boat and the distant shoreline, emphasizing realism in the materials. Introduce subtle shadows on the boat and in the ripples of the water to enhance depth and dimensional contrast, maintaining a balance between realism and a tranquil, painterly aesthetic."
# AuthorRole.ASSISTANT - VisionAssistant: 'I will now instruct the Dall-e Assistant to generate a new image based on the further refined prompt:

"Create an image of a solitary boat floating on a calm lake at sunset with a slightly less saturated and more natural palette of reds, oranges, and purples. The sun should be placed at the intersection according to the rule of thirds for dynamic composition. Include a clear and detailed distant mountain range under a sky with realistic cloud formations, adding depth and contrast to the serene water that reflects a more subtly vibrant sunset."

'
Vision Assistant Message: Analyzing the image...
Vision Assistant Message: ### Analysis and Critique:

This image depicts a serene scene focused on a solitary boat floating on smooth water during a vivid sunset. The overall composition is visually striking, owing to its strong, warm color palette dominated by deep reds and oranges, which evokes a sense of calm and tranquility.

Strengths:

  1. Color and Tone: The color grading is rich, creating a warm and inviting atmosphere. This not only enhances the subject's appeal but effectively communicates a mood of peace and stillness.
  2. Composition: The boat is well-placed within the frame, using the rule of thirds effectively to balance the composition. The sun is also strategically positioned to serve as a focal point, drawing the viewer's eye directly.
  3. Reflection and Texture: The reflection of the sunlight and the boat in the water adds depth to the image, and the smooth texture of the water contrasts nicely with the detailed texture of the boat.

Areas for Improvement or Variation:

  1. Dynamic Elements: While the tranquility is a key strength, introducing subtle dynamic elements could increase visual interest. Perhaps some gentle ripples around the boat caused by a mild breeze could add a minimal yet appealing effect without disturbing the peacefulness.
  2. Sky Detail: Introducing subtle cloud details or varying gradients in the sky might add more depth and contrast to the upper portion of the image, which is predominantly one solid color.

Enhanced Prompt for DALL-E:

"Create an image of a small, traditional wooden boat gently floating on a calm lake during sunset. The lake reflects a vibrant mix of deep reds and oranges from the sunset sky, which features light cloud wisps and dynamic rays of sunlight piercing through. Add a slight ripple effect around the boat suggesting a soft breeze. Ensure the scene conveys a serene yet subtly dynamic atmosphere, rich in color with a balanced composition using the rule of thirds."

This enhanced prompt suggests slight changes that maintain the serene mood while introducing elements that could make the scene even more engaging and visually diverse.
# AuthorRole.ASSISTANT - VisionAssistant: 'Based on the analysis and critique, here is the final refined prompt to further enhance the image:

"Create an image of a small boat floating on a glass-like lake at sunset. The sky should be a gradient of deep red to orange, reflecting vividly on the water. Add more texture to the boat and the distant shoreline, emphasizing realism in the materials. Introduce subtle shadows on the boat and in the ripples of the water to enhance depth and dimensional contrast, maintaining a balance between realism and a tranquil, painterly aesthetic."

Let's generate the final image based on this enhanced prompt.'
Vision Assistant Message: Analyzing the image...

@moonbox3
Copy link
Contributor

My advice is to play around with the selection criteria some more. Otherwise, just do a sequential selection strategy for your three agents, and when you get to the 4th agent, configure the termination to handle it.

@zzulueta
Copy link
Author

How do you do a Sequential selection strategy in SK? It was very easy with Autogen with the SelectorGroupChat and Swarm. There doesn't seem to be an equivalent in SK.

@moonbox3
Copy link
Contributor

@zzulueta the sequential strategy is configured by default (the selection strategy is optional, so if one is not configured we use the sequential strategy).

selection_strategy: SelectionStrategy = Field(default_factory=SequentialSelectionStrategy)

@zzulueta
Copy link
Author

@moonbox3 thanks! I must have missed this in the MS Learn documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agents python Pull requests for the Python Semantic Kernel
Projects
Status: No status
Development

No branches or pull requests

3 participants