Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updating getting started page and overview page #466

Merged
merged 1 commit into from
Dec 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 21 additions & 11 deletions docs/ten_agent/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ In this chapter, let's build the TEN Agent playground together.
{% tab title="API Keys" %}

* Agora [ App ID ](https://docs.agora.io/en/video-calling/get-started/manage-agora-account?platform=web#create-an-agora-project) and [ App Certificate ](https://docs.agora.io/en/video-calling/get-started/manage-agora-account?platform=web#create-an-agora-project)(free minutes every month)
* [OpenAI](https://openai.com/index/openai-api/) API key
* Azure [speech-to-text](https://azure.microsoft.com/en-us/products/ai-services/speech-to-text) and [text-to-speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) API keys
<!-- * [OpenAI](https://openai.com/index/openai-api/) API key -->
<!-- * Azure [speech-to-text](https://azure.microsoft.com/en-us/products/ai-services/speech-to-text) and [text-to-speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) API keys -->
{% endtab %}

{% tab title="Installations" %}
Expand All @@ -42,16 +42,26 @@ In this chapter, let's build the TEN Agent playground together.
**Docker setting on Apple Silicon**

{% hint style="info" %}
You will need to uncheck "Use Rosetta for x86\_64/amd64 emulation on Apple Silicon" option for Docker if you are on Apple Silicon, otherwise the server is not going to work.
For Apple Silicon Macs, uncheck "Use Rosetta for x86/amd64 emulation" in Docker settings. Note: This may result in slower build times on ARM, but performance will be normal when deployed to x64 servers.
{% endhint %}

<figure><img src="../assets/gif/docker_setting.gif" alt="" width="563"><figcaption><p>Make sure the box is unchecked</p></figcaption></figure>

## Next step

**1. Prepare config files**
**1. Clone down the TEN Agent repository**

In the root of the project, use `cd` command to create \`.env\` file from example.
{% code title=">_ Terminal" %}

```sh
git clone https://github.com/TEN-framework/TEN-Agent.git
```

{% endcode %}

**2. Prepare config files**

Open TEN Agent in your code editor. In the root of the project, use `cd` command to create \`.env\` file from example.

{% code title=">_ Terminal" %}

Expand All @@ -61,7 +71,7 @@ cp ./.env.example ./.env

{% endcode %}

**2. Setup Agora App ID and App Certificate in .env file**
**3. Setup Agora App ID and App Certificate in .env file**

Open the `.env` file and fill in Agora App ID and App Certificate.These will be used to connect to Agora RTC extension.

Expand All @@ -74,7 +84,7 @@ AGORA_APP_CERTIFICATE=

{% endcode %}

**3. Start agent builder toolkit containers**
**4. Start agent builder toolkit containers**

In the same directory, run the `docker` command to compose containers:

Expand All @@ -86,7 +96,7 @@ docker compose up -d

{% endcode %}

**4. Enter container**
**5. Enter container**

Use the following command to enter the container:

Expand All @@ -98,7 +108,7 @@ docker exec -it ten_agent_dev bash

{% endcode %}

**5. Build the agent**
**6. Build the agent**

Use the following command to build the agent:

Expand All @@ -110,7 +120,7 @@ task use

{% endcode %}

**6. Start the web server**
**7. Start the web server**

Use the following command to start the web server:

Expand All @@ -123,7 +133,7 @@ task run
{% endcode %}


**7. Edit playground settings**
**8. Edit playground settings**

Open the playground at [localhost:3000](http://localhost:3000) to configure your agent.
1. Select a graph type (e.g. Voice Agent, Realtime Agent)
Expand Down
163 changes: 31 additions & 132 deletions docs/ten_agent/overview.md
Original file line number Diff line number Diff line change
@@ -1,158 +1,57 @@
# Overview

TEN Agent, built on the TEN framework, is an open-source AI agent project. It can speak, see and access to a knowledge base.
TEN Agent is a conversational AI agent powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.

It also serves as a boilerplate for using the TEN framework, offering a great way to fully understand how to use the framework by diving into TEN Agent.
## Links

## Architecture
- [TEN Agent](https://github.com/TEN-framework/ten_agent)
- [TEN Framework](https://github.com/TEN-framework/ten_framework)

The TEN Agent consists of two main parts:
## Architecture

1. **Frontend**: **Playground** (Port 3000)
- Web interface with API Router and UI
- Manages communication and graph control
The TEN Agent project is organized into the following major components, offering clarity and extensibility for developers:

2. **Backend Services**
- Designer (Port 49483): Handles graph operations
- WebServer (Port 8080): Processes commands
- AgentApp: Core component with extensions and config
1. **Agents**: Contains the core logic, binaries, and examples for building and running AI agents. Within the Agents folder, there is a subfolder called `ten_packages,` which houses a variety of ready-to-use extensions. By leveraging these extensions, developers can build and customize powerful agents tailored to specific tasks or workflows.

![Components Diagram](https://github.com/TEN-framework/docs/blob/main/assets/jpg/diagram.jpg?raw=true)
2. **Dev Server**: Backend services, orchestrating agents and handling extensions.
3. **Web Server**: Runs on port 8080 and serves the frontend interface. The web server handles HTTP requests and delivers assets.
4. **Extensions**: Modular integrations for LLMs, TTS/STT, and external APIs, enabling easy customization.
5. **Playground**: An interactive environment for testing, configuring, and fine-tuning agents.
6. **Demo**: A deployment-ready setup to showcase real-world applications of TEN Agent.

## Docker Containers

There are two Docker containers in TEN Agent:

- `astra_agents_dev`: This is the workspace for development. The code repository will be mirrored into the /app folder in the container, and it provides the build environment. The backend service will also run in this container.
- `astra_playground`: This is a separate service for the web frontend. The compiled frontend files will be mirrored here.

## Agents

The agents directory contains the core runtime of TEN, along with the graphs defined by users and some miscellaneous items like build scripts.

## Manifest

The application’s metadata and required extensions are specified here. Please note that `ten_runtime_go`, `py_init_extension_cpp`, and `agora_rtc` must be included. The required items will be stored in the `/ten_packages` directory.

## Property

All graph information is stored in property.json. We recommend using the Graph Designer to create and edit the graphs instead of directly modifying `property.json`. Each graph consists of a list of nodes and connections:

- In each node section, specify which extension will be used in the node, along with all required environment variables.
- In each connection section, specify how the data is passed. Data should flow from an extension to one or more destination extensions. The data format must be one of the four formats defined by TEN: Command, Data, Video Frame, or Audio Frame.

For details, see the API reference on interfaces in graphs.

## Bin

The build script will compile the graphs into binaries stored in the `bin` folder. The binaries can be called by other services, such as the server. Note that the binary must be restarted for changes in `property.json` to take effect.

## Server

The server folder includes a lightweight HTTP server and a module for running agent binaries. Below are the HTTP APIs for use. They can also be integrated with other frontend applications, such as mobile apps.

## Start
- `ten_agent_dev`: The main development container that powers TEN Agent. It contains the core runtime environment, development tools, and dependencies needed to build and run agents. This container lets you execute commands like `task use` to build agents and `task run` to start the web server.

Starts an agent with the given graph and overridden properties. The started agent will join the specified channel and subscribe to the uid used by your browser/device’s RTC.
- `ten_agent_playground`: Port 3000, a dedicated container for the web frontend interface. It serves the compiled frontend assets and provides an interactive environment where users can configure modules, select extensions, and test their agents. The playground UI allows you to visually select graph types (like Voice Agent or Realtime Agent), choose modules, and configure API settings.

| Parameter | Description |
|-----------|-------------|
| user_uid | The uid used by your browser/device's RTC, needed by the agent to subscribe to your audio. |
| timeout | Specifies how long the agent will remain active without receiving pings. If set to -1, the agent will not terminate due to inactivity. The default is 60 seconds, but this can be adjusted with the WORKER_QUIT_TIMEOUT_SECONDS variable in your .env file. |
| request_id | A UUID for tracing requests. |
| properties | Additional properties to override in property.json (these overrides won't affect the original property.json, only the agent instance). |
| graph_name | The graph to be used when starting the agent, found in property.json. |
| channel_name | Must match the one your browser/device joins; the agent needs to be in the same channel to communicate. |
| bot_uid | (Optional) The uid used by the bot to join RTC. |
- `ten_agent_demo`: Port 3002, a deployment-focused container that provides a production-ready sample setup. It demonstrates how users can deploy their configured agents in real-world scenarios, with all necessary components packaged together for easy deployment.

Example:

{% code title=">_ Terminal" %}

```bash
curl 'http://localhost:8080/start' \
-H 'Content-Type: application/json' \
--data-raw '{
"request_id": "c1912182-924c-4d15-a8bb-85063343077c",
"channel_name": "test",
"user_uid": 176573,
"graph_name": "camera.va.openai.azure",
"properties": {
"openai_chatgpt": {
"model": "gpt-4o"
}
}
}'
```

{% endcode %}

## Stop

Stops the agent that was previously started.

| Parameter | Description |
|-----------|-------------|
| request_id | A UUID for tracing requests. |
| channel_name | The channel name used to start the agent. |

Example:

{% code title=">_ Terminal" %}

```bash
curl 'http://localhost:8080/stop' \
-H 'Content-Type: application/json' \
--data-raw '{
"request_id": "c1912182-924c-4d15-a8bb-85063343077c",
"channel_name": "test"
}'
```

{% endcode %}

## Ping
## Agents

Sends a ping to the server to indicate the connection is still alive. This is unnecessary if you specify timeout: -1 when starting the agent. Otherwise, the agent will quit if it doesn’t receive a ping after the specified timeout.
The Agents folder is the heart of the project, housing:

Example:
- Core binaries and examples that define agent behaviors.
- Scripts and outputs that enable flexible configurations for various AI use cases.
- Tools for developers to create, modify, and enhance AI agents.

{% code title=">_ Terminal" %}
With its structured design, the Agents folder allows you to build agents tailored to specific applications, whether it’s voice assistants, chatbots, or task automation.

```bash
curl 'http://localhost:8080/ping' \
-H 'Content-Type: application/json' \
--data-raw '{
"request_id": "c1912182-924c-4d15-a8bb-85063343077c",
"channel_name": "test"
}'
```
## Demo

{% endcode %}
The Demo folder provides a deployment-ready environment for showcasing TEN Agent in action. It includes:
- Example configurations for running agents in production.
- Prebuilt agents and workflows to highlight the framework’s capabilities.
- Tools for demonstrating real-world applications to users, clients, or collaborators.

## Playground

Playground is the UI of TEN Agent. It is built with NextJS. You can preview it online at <https://agent.theten.ai/>.

The code to handle audio input/output and transcribed text is in src/manager/rtc/rtc.ts. The code captures user audio and transmits it to the agent server while the agent’s audio is sent back to the web app.

Example of joining a channel:

{% code title="src/manager/rtc/rtc.ts" %}

```typescript
async join({ channel, userId }: { channel: string; userId: number }) {
if (!this._joined) {
const res = await apiGenAgoraData({ channel, userId });
const { code, data } = res;
if (code !== 0) {
throw new Error("Failed to get token");
}
const { appId, token } = data;
await this.client?.join(appId, channel, token, userId);
this._joined = true;
}
}
```
Once the playground is up and running, users can leverage the module picker to:
- Select and configure extensions from a range of prebuilt modules.
- Experiment with different AI models, TTS/STT systems, and real-time communication tools.
- Test agent behaviors in a safe, interactive environment.

{% endcode %}
The playground serves as a hub for innovation, empowering developers to explore and fine-tune their AI systems effortlessly.
Loading