Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce Agent #175

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

feat: Introduce Agent #175

wants to merge 26 commits into from

Conversation

mathislucka
Copy link
Member

@mathislucka mathislucka commented Jan 24, 2025

Agent

This PR introduces an Agent component that slots right into Haystack's pipelines.

An Agent consists of an LLM and tools. The LLM repeatedly call tools until a handoff-condition is met. By default, the Agent returns when the LLM produces a pure text message without any tool calls.

You run the Agent by passing a list of chat messages. The agent returns a list of chat messages including the full sequence of tool calls and tool results.

Under the hood, the Agent uses a cyclic pipeline with the following components:

  • ChatGenerator
  • ToolInvoker
  • ConditionalRouter
  • BranchJoiner

Usage Example

This is a simple Agent that can perform web search to answer a user's question.

from haystack.components.websearch import SearchApiWebSearch
from haystack.tools import ComponentTool
from haystack.dataclasses import ChatMessage

from haystack_experimental.components.agents import Agent

search_tool = ComponentTool(
  component=SearchApiWebSearch(),
  name="web_search",
  description="Use this tool to perform a search on the web."
)

system_prompt = """
Answer the users question.
Don't rely on your own knowledge, use the `web_search` utility to search for information on the web.
Break down your searches into multiple queries if you have to. Pass them to `web_search` one by one.
  """

agent = Agent(model="openai:gpt-4o", tools=[search_tool], system_prompt=system_prompt)

result = agent.run(messages=[ChatMessage.from_user("Give me the Oscar winners for best actor for the past 3 years and summarize the plot of the films that they won with.")]

# The agent will run queries repeatedly until it returns a response.
# Access the response and the full history of tool calls and tool results
for msg in result["messages"]:
  print(msg)

handoff

Currently, the Agent exits the loop when the handoff-condition is met.
The default is Agent(handoff="text") which means that the agent exits the loop when it produces a text-only response.
Users can set handoff to the name of any tool, which will cause the agent to return, after that tool was called.
For the web search example, if we set handoff="web_search", the agent would return after the web_search-tool was called and executed once.

Extending 'handoff'
We could extend handoff and allow users to pass a Jinja2-expression in addition to tool names and "text".
The ConditionalRouter has access to the message history, the last message generated from the ChatGenerator and the last tool result message from the ToolInvoker. Let's say we wanted to return once the ChatGenerator generates a message that contains the word 'Done':

Agent(handoff="{{ 'Done' in llm_messages[0].text }}")

This isn't implemented yet, but we could do it.

Passing additional context to tools

In many cases, it is useful to pass additional inputs to a tool or to return results from a tool directly to the pipeline.
Assume we want to add a list of all the search results that the agent viewed to construct an answer with sources.
Additionally, we want to restrict our web search agent to only search on Wikipedia.

This can be achieved by defining input_variables and output_variables when initializing the Agent and by defining tools that accept ctx.

from haystack.components.websearch import SearchApiWebSearch
from haystack.tools import ComponentTool
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
from haystack.components.builders import PromptBuilder

from haystack_experimental.components.agents import Agent
from haystack_experimental.tools.tool_context import ToolContext


def web_search(ctx: ToolContext, query: str):
  # we can get additional inputs from the 'ctx' by calling get_input
  allowed_domains = ctx.get_input("allowed_domains")
  comp = SearchApiWebSearch(allowed_domains=allowed_domains)
  
  result = comp.run(query=query)
  
  if existing_docs := cxt.get_output("documents"):
    # we can pass on additional outputs from the tool by calling set_output
    ctx.set_output("documents", existing_docs + result["documents"])
  else:
    ctx.set_output("documents", result["documents"])

  return result["documents"]


system_prompt = """
Answer the users question.
Don't rely on your own knowledge, use the `web_search` utility to search for information on the web.
Break down your searches into multiple queries if you have to. Pass them to `web_search` one by one.
  """

agent = Agent(
  model="openai:gpt-4o",
  tools=[search_tool],
  system_prompt=system_prompt,
  input_variables={"allowed_domains": List[str]},
  output_variables={"documents": List[Document]}
)

# let's use a PromptBuilder to get a response in a nice format
answer_template = """
{{ (messages|last).text }}

_Viewed Search Results:_
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
"""

answer_formatter = PromptBuilder(template=answer_template)

pipe = Pipeline()

pipe.add_component("agent", agent)
pipe.add_component("answer", answer_formatter)

pipe.connect("agent.messages", "answer.messages")

# note that the agent now has an output socket for documents
pipe.connect("agent.documents", "answer.documents")

user_input = [ChatMessage.from_user("Give me the Oscar winners for best actor for the past 3 years and summarize the plot of the films that they won with.")]

# Note how the agent now has an "allowed_domains" input socket.
result = pipe.run(data={"agent": {"messages": user_input, "allowed_domains": ["wikipedia.org"]}})

Apart from passing additional inputs to tools and getting additional outputs from the agent, we can also pass values from one tool to another by writing to and reading from ToolContext.outputs.

To be discussed:

  • This mechanism does not work with ComponentTool yet. We should discuss how the same behaviour could be implemented there.

  • I opted for a simple dataclass used as ToolContext. This dataclass does not perform type checking at runtime. Theoretically, we could implement a version of ToolContext using Pydantic dynamic models, that would validate types at runtime but it's a more complex implementation, brings reliance on Pydantic and I think it has limited benefits.

Tracing

Tracing generally works with all of our different tracing integrations. However, since the Agent uses a pipeline internally, it would be great if we could associate that trace with the pipeline that called the agent. This is a general enhancement that would also be useful for SuperComponent as implemented here. I think we should tackle it in a separate PR though.

How did you test it?

Notes for the reviewer

Checklist

@mathislucka mathislucka requested a review from a team as a code owner January 24, 2025 16:45
@mathislucka mathislucka requested review from vblagoje and removed request for a team January 24, 2025 16:45
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@coveralls
Copy link

coveralls commented Jan 24, 2025

Pull Request Test Coverage Report for Build 13048849975

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-5.6%) to 58.15%

Totals Coverage Status
Change from base Build 13042914554: -5.6%
Covered Lines: 1345
Relevant Lines: 2313

💛 - Coveralls

@vblagoje vblagoje requested a review from anakin87 January 27, 2025 09:08
@anakin87
Copy link
Member

I want to share some initial thoughts and questions, mostly at a high level.
It would be great if @vblagoje could also review this PR and if @julian-risch and @bilgeyucel could take a look...

Agent as a component

  • I would like to understand the reasoning behind making Agent a component.
    My guess is that this might be to integrate it into Pipelines, but it seems a bit different from the principle we’ve followed in the past (keeping components small and self-contained). Could you share what you see as the pros and cons of this approach?

ToolContext

  • I noticed that ToolContext is heavily used in this implementation. While I understand it can sometimes be useful, I wonder if there’s an opportunity to simplify things. Do you think there’s room to reduce its usage or rethink other classes to make it less central?
    I would also love to hear what others think about this.

  • Slightly related. The problem mentioned in the PR regarding ComponentTool: is it specifically tied to ToolContext, or does it also involve Agent?

Implementation details

  1. I noticed that Agent is implemented as a Pipeline instead of plain Python, which seems to make the code longer and more complex. Since this is an internal implementation, is there a specific reason for choosing Pipelines here?

  2. In OpenAI Swarm, handoff is typically used to describe passing control to another Agent, but here it seems to indicate that the Agent is exiting. Would it make sense to consider alternatives like return_condition or end_condition?

  3. (not directly related to this PR) I’ve noticed for a while now that we don’t have a unified way to qualify ChatGenerators. I would avoid abstract classes, but maybe a thin Protocol could solve this.

(Could you rebase this PR to make it smaller and easier to review?)

@mathislucka mathislucka requested a review from a team as a code owner January 28, 2025 09:06
@mathislucka mathislucka requested review from dfokina and removed request for a team January 28, 2025 09:06
# Conflicts:
#	haystack_experimental/__init__.py
#	haystack_experimental/core/__init__.py
#	haystack_experimental/core/pipeline/__init__.py
@mathislucka
Copy link
Member Author

Thanks for the initial feedback!

I would like to understand the reasoning behind making Agent a component.
My guess is that this might be to integrate it into Pipelines, but it seems a bit different from the principle we’ve followed in the past (keeping components small and self-contained). Could you share what you see as the pros and cons of this approach?

Yes, exactly, this allows us to integrate it into pipelines. See the notebook for an example of how using it in a pipeline is beneficial.

Pros:

  • combine agentic and non-agentic workflow steps
  • users don't need to learn a new abstraction, they are already familiar with pipelines and components
  • use pipelines to orchestrate multi-agent systems
  • we get tracing and debugging utilities already built into pipelines for free

Cons:
I don't really see any. @vblagoje brought up initially that it would be great if users wouldn't have to know that it is a component but that's already the case. Users don't need to know anything about components or pipelines to call Agent.run(messages). We could even allow them to from Haystack import Agent if we wanted to make it more prominent.

I noticed that ToolContext is heavily used in this implementation. While I understand it can sometimes be useful, I wonder if there’s an opportunity to simplify things. Do you think there’s room to reduce its usage or rethink other classes to make it less central?
I would also love to hear what others think about this.

I tried to keep the use of ToolContext light. It's only used in two places. For users implementing tools on the other hand, ToolContext becomes more central and they need to understand it to use it. We can keep things easy though because ToolContext is optional, so if users don't need it, they don't even need to know it exists.

I found the possibility to pass additional context to tools from outside the agent, as well as passing context between tools, and passing outputs from tools to downstream components in the pipeline very useful when implementing agents.

I'm not entirely happy with how ToolContext needs to be used by users implementing tools though. I'd be very happy about alternative suggestions.

Slightly related. The problem mentioned in the PR regarding ComponentTool: is it specifically tied to ToolContext, or does it also involve Agent?

ComponentTool can be used with Agent without any problems. The only thing that I didn't implement is support for ToolContext. I think that for additional inputs it would be fairly easy to implement. We could just check the input sockets of the components and pass any ToolContext.inputs that match the input sockets to the component. This would also remove the burden from the user to implement handling of additional inputs. For outputs however, this is more difficult. Right now, I'm delegating the responsibility for how to handle existing outputs from the same or other tools to the user who implements the function for the tool. In the Issue Agent example in the notebook, I'm extending a list of documents with every tool call.

Assume we move that responsibility to ComponentTool.invoke:
We could check the component outputs for any matches to ToolContext.outputs. However, there would be open questions for me with regards to how to handle these outputs:

  • overwrite existing outputs?
  • extend them?
  • are outputs that went into ToolContext.outputs excluded from the ToolCallResult?

Ideally, I'd want to leave that decision to the user but I'm not sure how.

I noticed that Agent is implemented as a Pipeline instead of plain Python, which seems to make the code longer and more complex. Since this is an internal implementation, is there a specific reason for choosing Pipelines here?

I agree that using a pipeline is more verbose but I wouldn't say it's more complex. My main reasons were:

  • Haystack already foresees to use components for agentic behaviour (ChatGenerator, ToolInvoker) and the handoff condition was easy to implement with the ConditionalRouter, so since I was already using 3 components, it seemed natural to connect them using a pipeline
  • we get tracing for free
  • we can extend the implementation easily by simply adding components
  • users can take the implementation, integrate it directly into their pipelines and extend if needed

In OpenAI Swarm, handoff is typically used to describe passing control to another Agent, but here it seems to indicate that the Agent is exiting. Would it make sense to consider alternatives like return_condition or end_condition?

Yes, I know. I interpreted handoff as handing off control of the execution flow, so the idea was that the agent basically hands off to the pipeline. This handoff could also mean that it hands off to another Agent, if there is another Agent in the pipeline. However, I see how this might confuse users. What about exit_condition?

I merged main into the PR to remove the diff from the pipeline.run refactoring. You can also ignore the github components in the examples folder for now. There are more components than we actually need for the example notebook. My intention was to add a second agent as an example but I didn't get to it yet.

@vblagoje
Copy link
Member

vblagoje commented Jan 28, 2025

Here’s some high-level feedback and brainstorming inspired by this PR. I’m not diving into fine-grained details yet, as high-level considerations are more relevant at this stage. My initial reaction was similar to @anakin87—why is Agent a component? However, after reviewing the details, I believe this design has potential. It enables developers to easily combine deterministic and dynamic (LLM-driven) agent step at the appropriate level of granularity, while supporting the modular, LEGO-like construction of complex, multi agent AI systems.

To motivate the feedback and examples below, I’ve imagined what it would take to develop a [Google’s Deep Research](https://blog.google/products/gemini/google-gemini-deep-research/) clone using Haystack. This feedback highlights a few ideas we could incorporate: human feedback mechanisms, programmable handoffs, and two distinct approaches to multi-agent systems/workflows that we should prototype for developer experience (DX) in real examples.


1) Support a Callable Handoff Condition

Concept: Currently, the handoff condition can be "text", a tool’s name, or a Jinja condition. While these options are useful for straightforward scenarios, introducing a Python callback function for handoff logic would provide more flexibility. This function could inspect the latest LLM output and tool results to decide when to exit the loop. The example below is for illustration purposes only.

Example

def custom_handoff_condition(llm_output: ChatMessage, tool_result: Optional[ChatMessage]) -> bool:
    # Exit when the LLM output contains “DONE” or when it’s particularly short
    return "DONE" in llm_output.text

agent = Agent(
  model="openai:gpt-4",
  tools=[...],
  handoff=custom_handoff_condition,  # Programmable stop condition
)

Why It Helps

  • Flexibility: Programmable handoffs make the agent more adaptable, enabling complex stop conditions beyond simple string-based or single-tool exits.
  • Advanced Use Cases: Supports workflows where stopping is based on patterns in the output, token count, or other custom criteria.

2) Human Feedback

Human feedback is essential for workflows requiring oversight or validation. Depending on the use case, it can be incorporated in one of two ways:

A) As a Pipeline Component

For guaranteed user interaction at specific points, include a dedicated component for user sign-off:

pipe.add_component("humanFeedback", HumanFeedbackComponent(...))
# Enforces explicit "Is this okay?" checkpoints

B) As a Tool in an Agent’s Toolset

Alternatively, let the Agent dynamically decide when to involve the user, based on its reasoning:

def ask_user_feedback(question: str) -> str:
    response = input(f"{question} => ")
    return response

human_approval_tool = Tool(
  name="human_approval",
  function=ask_user_feedback,
  description="Ask the user if the partial result or plan is acceptable"
)

Why It Helps

  • Pipeline Component: Ensures deterministic user feedback at critical points for precise workflow control.
  • Tool: Offers dynamic, adaptive oversight, allowing the Agent to determine when user input is required.

3) How Do We Build Multi-Agent Systems in Haystack?

When designing workflows involving multiple specialized Agents (we've all see the examples of these online involving mutiple agents), Haystack offers two primary approaches. Each serves different use cases, depending on the need for deterministic execution or LLM-driven orchestration.


3A) Modular Agents as Components in a Pipeline

Concept: Agents can function as individual components in a pipeline. For instance, a “WebSearchAgent,” “SummarizerAgent,” and “ReviewAgent” can process data sequentially or iteratively.

Why It Helps

  1. Deterministic Control: Pipelines enforce a precise execution order, with optional loops for iterative refinement.
  2. Reusability: Each Agent can be independently tested and reused across workflows.
  3. Deep Research Example:
    • A WebSearchAgent gathers relevant data.
    • A SummarizerAgent organizes and synthesizes the information.
    • A ReviewAgent evaluates the quality and optionally loops back for improvements.

Example

# Agents
webSearchAgent = Agent(model="gpt-4", tools=[...])
summarizerAgent = Agent(model="gpt-4", tools=[])
reviewAgent = Agent(model="gpt-4", tools=[], system_prompt="Review text. If not correct, request improvements.")

# Pipeline
pipe.add_component("webSearchAgent", webSearchAgent)
pipe.add_component("summarizerAgent", summarizerAgent)
pipe.add_component("reviewAgent", reviewAgent)

# Connections
pipe.connect("webSearchAgent.messages", "summarizerAgent.messages")
pipe.connect("summarizerAgent.messages", "reviewAgent.messages")

# Optional: Loop back to SummarizerAgent if ReviewAgent requests changes

This approach is ideal for workflows that require deterministic execution where the pipeline graph guarantees precise control.


3B) Master Agent + Tools (Including PipelineTool)

Concept: Use the PipelineTool to encapsulate a pipeline of Agents into a single tool. This tool can then be dynamically orchestrated by a master Agent, alongside other tools such as OpenAPITool, user feedback tools or whatever other tool we add in the future.

Why It Helps

  • Encapsulation: Simplifies testing and promotes reuse by wrapping complex pipelines into a single tool.
  • Dynamic Orchestration: Allows the master Agent to make context-based decisions about tool invocation.
  • Deep Research Example:
    • A report_maker pipeline (WebSearch + Summarizer) generates a draft report.
    • A human_approval_tool gathers user feedback.
    • An OpenAPITool submits the final report to an endpoint.

Example

# (A) Sub-pipeline for "ReportMaker"
reportMakerPipeline = Pipeline()
reportMakerPipeline.add_component("webSearchAgent", webSearchAgent)
reportMakerPipeline.add_component("summarizerAgent", summarizerAgent)
reportMakerPipeline.connect("webSearchAgent.messages", "summarizerAgent.messages")

report_maker_tool = PipelineTool(
  pipeline=reportMakerPipeline,
  name="report_maker",
  description="Multi-step process (web search + summary)."
)

# (B) OpenAPITool for final report submission
openapi_tool = OpenAPITool(
  spec="https://example.com/submit-report",  # Hypothetical endpoint
  name="submit_report",
  description="Submit the final report"
)

# (C) Master Agent orchestrating tools
masterAgent = Agent(
    model="gpt-4",
    tools=[report_maker_tool, openapi_tool, human_approval_tool],
    system_prompt="""
        1) Call 'human_approval' for user feedback when necessary.
        2) Use 'report_maker' to generate and refine the report.
        3) Submit the final report using 'submit_report'.
    """
)

# Usage
user_message = [ChatMessage.from_user("Research the best AI marketing campaigns of the last 3 years")]
result = masterAgent.run(messages=user_message)

This approach is ideal for workflows requiring emergent, dynamic LLM planning, where the master Agent adapts based on context.

In summary, this direction shows promise imho. I’d love to double down on developing actual complex agent systems and feel the developer experience firsthand—this would allow us to provide more constructive feedback on how to proceed and build upon these ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants