Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the LLM Agents guide with Gemini2.0 Thinking demo #10247

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 142 additions & 6 deletions guides/05_chatbots/03_agents-and-tool-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
Tags: LLM, AGENTS, CHAT
Related spaces: https://huggingface.co/spaces/gradio/agent_chatbot, https://huggingface.co/spaces/gradio/langchain-agent

The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents. This guide will show you how.
The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents and Thinking LLMs. This guide will show you how.
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

## The metadata key

In addition to the `content` and `role` keys, the messages dictionary accepts a `metadata` key. At present, the `metadata` key accepts a dictionary with a single key called `title`.
If you specify a `title` for the message, it will be displayed in a collapsible box.

Here is an example, were we display the agent's thought to use a weather API tool to answer the user query.
Here is an example, where we display the agent's thought to use a weather API tool to answer the user query.

```python
with gr.Blocks() as demo:
Expand All @@ -24,7 +24,9 @@ with gr.Blocks() as demo:
![simple-metadat-chatbot](https://github.com/freddyaboulton/freddyboulton/assets/41651716/3941783f-6835-4e5e-89a6-03f850d9abde)


## A real example using transformers.agents
## Building with Agents

### A real example using transformers.agents

We'll create a Gradio application simple agent that has access to a text-to-image tool.

Expand Down Expand Up @@ -87,12 +89,11 @@ You can see the full demo code [here](https://huggingface.co/spaces/gradio/agent
![transformers_agent_code](https://github.com/freddyaboulton/freddyboulton/assets/41651716/c8d21336-e0e6-4878-88ea-e6fcfef3552d)


## A real example using langchain agents
### A real example using langchain agents

We'll create a UI for langchain agent that has access to a search engine.

We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with
the following environment variables set -
We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with the following environment variables set -

```
SERPAPI_API_KEY=
Expand Down Expand Up @@ -166,4 +167,139 @@ demo.launch()
That's it! See our finished langchain demo [here](https://huggingface.co/spaces/gradio/langchain-agent).


## Building with Visible Thinking LLMs
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved


The Gradio Chatbot can natively display intermediate thoughts of a _thinking_ LLM. This makes it perfect for creating UIs that show how an AI model "thinks" while generating responses. Below guide will show you how to build a chatbot that displays Gemini AI's thought process in real-time.


### A real example using Gemini2.0 Flash Thinking API
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

Let's create a complete chatbot that shows its thoughts and responses in real-time. We'll use Google's Gemini API for accessing Gemini2.0 Flash Thinking LLM and Gradio for the UI.
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

We'll begin with imports and setting up the gemini client. Note that you'll need to acquire a Google Gemini API key first -
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

```python
import gradio as gr
from gradio import ChatMessage
from typing import Iterator
import google.generativeai as genai

genai.configure(api_key="your-gemini-api-key")
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")
```

First, let's set up our streaming function that handles the model's output:

```python
def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
"""
Streams both thoughts and responses from the Gemini model.
"""
# Initialize response from Gemini
response = model.generate_content(user_message, stream=True)

# Initialize buffers
thought_buffer = ""
response_buffer = ""
thinking_complete = False

# Add initial thinking message
messages.append(
ChatMessage(
role="assistant",
content="",
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)
)

for chunk in response:
parts = chunk.candidates[0].content.parts
current_chunk = parts[0].text

if len(parts) == 2 and not thinking_complete:
# Complete thought and start response
thought_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=thought_buffer,
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)

# Add response message
messages.append(
ChatMessage(
role="assistant",
content=parts[1].text
)
)
thinking_complete = True

elif thinking_complete:
# Continue streaming response
response_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=response_buffer
)

else:
# Continue streaming thoughts
thought_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=thought_buffer,
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)

yield messages
```

Then, let's create the Gradio interface:
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

```python
with gr.Blocks() as demo:
gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")

chatbot = gr.Chatbot(
type="messages",
label="Gemini2.0 'Thinking' Chatbot",
render_markdown=True,
)

input_box = gr.Textbox(
lines=1,
label="Chat Message",
placeholder="Type your message here and press Enter..."
)

# Set up event handlers
msg_store = gr.State("") # Store for preserving user message

input_box.submit(
lambda msg: (msg, msg, ""), # Store message and clear input
inputs=[input_box],
outputs=[msg_store, input_box, input_box],
queue=False
).then(
user_message, # Add user message to chat
inputs=[msg_store, chatbot],
outputs=[input_box, chatbot],
queue=False
).then(
stream_gemini_response, # Generate and stream response
inputs=[msg_store, chatbot],
outputs=chatbot
)

demo.launch()
```

This creates a chatbot that:

- Displays the model's thoughts in a collapsible section
- Streams the thoughts and final response in real-time
- Maintains a clean chat history

That's it! You now have a chatbot that not only responds to users but also shows its thinking process, creating a more transparent and engaging interaction. See our finished Gemini2.0 Flash Thinking demo [here](https://huggingface.co/spaces/ysharma/Gemini2-Flash-Thinking).
yvrjsharma marked this conversation as resolved.
Show resolved Hide resolved

Loading