Skip to content

A Voice-AI application that shows how to use jambonz to connect to the OpenAI Realtime API

License

Notifications You must be signed in to change notification settings

jambonz/openai-s2s-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openai-s2s

This is an example jambonz application that connect to the OpenAI Realtime API and illustrates how to build a Voice-AI application using jambonz and OpenAI.

Authentication

You must have an OpenAI API key that has access to the Realtime API. Specify it as an environment variable when starting the application.

OPENAI_API_KEY=sk-proj-XXXXXXX node app.js

Prerequisites

This application requires a jambonz server running release 0.9.2-rc3 or above.

Configuring the assistant

All of the configuration (in fact, all of the relevant code) can be found in this source file. This is the file you will want to edit as you play with this example.

You can see that application first answers the call, pauses one second, and the connects to the OpenAI Realtime API using the jambonz llm verb. We specify the vendor and model, and provide options specific to that LLM (in this case gpt-4o-realtime-preview-2024-10-01) in the llmOptions property.

In the case of the OpenAI Realtime API, configuration is provided in the form of the response_create and session_update client events that are sent to OpenAI. These specify the instructions to the assistant as well as things like vad and function calling options.

Function calling

The example illustrates how to implement client-side functions and provide them to the assistant. In this example, we implement a simple "get weather" function using the freely-available APIs from open-meteo.com. The function is described in the session_update client message, and a toolHook property for the llm verb defines the hook that will be called in the application when the LLM wants the application to call a function. Finally, the session.sendToolOutput() method is called to send the results of the function call back to the LLM.

Interrupting the assistant

When the user begins speaking over the assistant (i.e. "barge in") jambonz sends a response.cancel client event to interrupt the assistant. Any queued audio that has been received from the assistant is flushed.

Events

There are 28 server events that OpenAI sends, and your application can specify which it wants to receive. (The only exception is the esponse.audio.delta server event, because this contains actual audio content that jambonz itself processes). You specify which events you want to receive in the events property of the llm verb, and as you can see in the example you can use wildcards to include a whole class of server events (e.g. "conversation.item.*").

ActionHook

Like many jambonz verbs, the llm verb sends an actionHook with a final status when the verb completes. The payload will include a completion_reason property indicating why the llm session completed. This property will be one of:

  • normal conversation end
  • connection failure
  • disconnect from remote end
  • server failure
  • server error

In the case of an error an error_code object is returned. We use this, for example, in this sample application to detect if the user's OpenAI's rate limits have been exceeded so as to notify them why the session is ending.

About

A Voice-AI application that shows how to use jambonz to connect to the OpenAI Realtime API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published