Skip to content

Commit

Permalink
[OpenAI] Azure OpenAI realtime client library for Java first version (#…
Browse files Browse the repository at this point in the history
…42707)

* WIP: trying to get code gen to work

* added code gen classes

* Project compiles

* Added BaseOpenAIClient to be able to pass for feature clients

* Re-structure packages

* Adding classes for websocket comms

* Added placeholder classes

* Added .env to gitignore list

* Client impls wired together

* Added more required classes for ws protocol

* Added clarifying comments

* Cleaned up pointless interface

* Added classes to provide service specific configurations

* Adjusted ClientEndpointConfiguration for our usecase

* Protocol correction in hardocoded URL

* Added subprotocol and bearer prefix for header

* Wired RealtimeClientBuilder with RealtimeAsyncClient

* minor renames

* small cleanup

* Project compiles and fields are propagated to the async client

* Finally getting a 404

* Got finally a 200

* Adding graceful stop and server message emission

* Finished up handlers

* WIP: added close implementation

* WIP: sample is still in progress, serialization works

* WIP: trying to send audio

* WIP: sample works as in other languages

* Restored Azure OpenAI Inference SDK

* Added AOAI realtime SDK

* Trying out solution for frames assumed to be contiguous in wss session

* Restored doc with TODOs

* Fixed config for nonAzure OAI

* Refactored LowLevelSample

* Added unit tests and cleaned up sample a bit

* Cleaned up samples

* Re-structured test packages

* Trying to cleanup implementation for frame collection

* Replaced print with log statements

* Restored print statements

* Added support for more generalized authentication types

* Removed last references in inference of realtime client

* Removed old readme file

* Setup WebsocketFrameAggregator from Netty

* Replaced ackId with String eventId

* WIP: cleanup

* Added license header to source files

* Files for Readme and its samples

* Added sync client bare bones impl

* More cleanup

* cleaned up versions

* Adding documentation

* adding more tests

* Item manipulation test passing

* Fixed textOnly test bad config

* Trying to make a tool/audio test

* Test green

* Ported final test from .NET

* Blocking on StepVerifier

* Ported tests for nonAzure case

* Added more eventHandler methods for the sync client

* Added the remove handler analogous methods for the sync client

* Added first blocking test

* Updated netty and add exclussion directive for bannedDeps

* style check

* mvn packaging works (by skipping most things anyway)

* ported canConfigureSession to blocking client tests

* Added blocking version of textOnly test

* ItemManipulation test sync passing

* Added tests for tool with audio file

* Finalized sync Azure tests

* Added nonAzure blocking

* initial tsp-location

* Corrections

* Commit hash update

* Added some customizations to work around code gen issues

* Added package info to utils package

* Cleaned up imports

* Reverting changes in other libraries

* Reverted changes

* Regen with problems

* Fixed customization problems

* WIP: main readme

* Added to Flux.error mapper, convenience ctx, new sample

* Customized VAD detection to pass ms values as numeric JSON

* File upload

* Introduced AudioFile to handle different sample rates, etc.

* Corrected values for file in sample

* Sample sends file correctly

* ClienSample completed with audio verified

* cleanup

* method rename

* Sample documentation

* More documentation

* More sample documentation

* Added sync usage sample

* Updated files according to APIView feedback

* Re-run code gen

* samples/README.md setup ready

* Adding scaffoldings for the readme

* Added dev feed instruction setup to README

* Added readme samples and completed main readme

* Update sdk/openai/azure-ai-openai-realtime/README.md

Co-authored-by: Srikanta <[email protected]>

* First round of feedback

* Added traits to builder

* Disabled tests

* Renamed all events

* Re-added code customizations lost in the code regen

* mvn clean install run

* Style checks passing

* Disabling jacoco for now

* Update sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/OpenAIRealtimeServiceVersion.java

Co-authored-by: Srikanta <[email protected]>

* bumping versions

* bump

* adding library

* current version

* temporary fix

* cspell ignore

* locale

* metadata

* workaround

* fix

* javadoc

---------

Co-authored-by: Srikanta <[email protected]>
Co-authored-by: Gerardo Lecaros <[email protected]>
  • Loading branch information
3 people authored Dec 4, 2024
1 parent 91d4103 commit 832f06c
Show file tree
Hide file tree
Showing 152 changed files with 18,614 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,9 @@
# PRLabel: %OpenAI
/sdk/openai/azure-ai-openai-assistants/ @brandom-msft @jpalvarezl @mssfang

# PRLabel: %OpenAI
/sdk/openai/azure-ai-openai-realtime/ @brandom-msft @jpalvarezl @mssfang

# ServiceLabel: %Operational Insights
# ServiceOwners: @AzmonLogA

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ temp/

# Sensitive files
*.json.env
.env

#javadoc overview files generated from README.md
readme_overview.html
Expand Down
3 changes: 2 additions & 1 deletion .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/StemmerTokenFilterLanguage.java",
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/SnowballTokenFilterLanguage.java",
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TextTranslationSkillLanguage.java",
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TokenFilterName.java"
"sdk/search/azure-search-documents/src/main/java/com/azure/search/documents/indexes/models/TokenFilterName.java",
"sdk/openai/azure-ai-openai-realtime/tsp-location.yaml"
],
"words": [
"adal",
Expand Down
1 change: 1 addition & 0 deletions eng/versioning/version_client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ com.azure:azure-ai-metricsadvisor;1.2.3;1.3.0-beta.1
com.azure:azure-ai-metricsadvisor-perf;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-openai;1.0.0-beta.12;1.0.0-beta.13
com.azure:azure-ai-openai-assistants;1.0.0-beta.4;1.0.0-beta.5
com.azure:azure-ai-openai-realtime;1.0.0-beta.1;1.0.0-beta.1
com.azure:azure-ai-personalizer;1.0.0-beta.1;1.0.0-beta.2
com.azure:azure-ai-textanalytics;5.5.3;5.6.0-beta.1
com.azure:azure-ai-textanalytics-perf;1.0.0-beta.1;1.0.0-beta.1
Expand Down
7 changes: 7 additions & 0 deletions sdk/openai/azure-ai-openai-realtime/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Other Changes

#### Dependency Updates

## 1.0.0-beta.1 (TBD)

- Azure OpenAI Realtime client library for Java.
316 changes: 316 additions & 0 deletions sdk/openai/azure-ai-openai-realtime/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
# Azure OpenAI Realtime client library for Java (experimental)

This preview introduces a new `/realtime` API endpoint for the `gpt-4o-realtime-preview` model family. `/realtime`:

- Supports low-latency, "speech in, speech out" conversational interactions
- Works with text messages, function tool calling, and many other existing capabilities from other endpoints like `/chat/completions`
- Is a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user

`/realtime` is built on [the WebSockets API](https://developer.mozilla.org/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. It's designed to be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections; it **is not** designed to be used directly from untrusted end user devices, and device details like capturing and rendering audio data are outside the scope of the `/realtime` API.

At a summary level, the architecture of an experience built atop `/realtime` looks something like the following (noting that the user interactions, as previously mentioned, are not part of the API itself):

```mermaid
sequenceDiagram
actor User as End User
participant MiddleTier as /realtime host
participant AOAI as Azure OpenAI
User->>MiddleTier: Begin interaction
MiddleTier->>MiddleTier: Authenticate/Validate User
MiddleTier--)User: audio information
User--)MiddleTier:
MiddleTier--)User: text information
User--)MiddleTier:
MiddleTier--)User: control information
User--)MiddleTier:
MiddleTier->>AOAI: connect to /realtime
MiddleTier->>AOAI: configure session
AOAI->>MiddleTier: session start
MiddleTier--)AOAI: send/receive WS commands
AOAI--)MiddleTier:
AOAI--)MiddleTier: create/start conversation responses
AOAI--)MiddleTier: (within responses) create/start/add/finish items
AOAI--)MiddleTier: (within items) create/stream/finish content parts
```

Note that `/realtime` is in **public preview**. API changes, code updates, and occasional service disruptions are expected.

This client library is currently made available **only in our dev feed**. For detailed instructions see the [dev feed documentation.][dev_feed_instructions]

## Getting started

### Prerequisites

- [Java Development Kit (JDK)][jdk] with version 8 or above
- [Azure Subscription][azure_subscription]
- [Azure OpenAI access][azure_openai_access]
- [Quickstart: GPT-4o Realtime API for speech and audio (Preview)][quickstart]

### Adding the package to your project

This project is currently only available in the dev feed. For detailed instructions on how to set up your project to consume the dev feed
please visit the [dev feed documentation page.][dev_feed_instructions] Here you can find the steps for the `maven` and `gradle setup.

#### Maven dev feed setup

##### Step 1: get a PAT (Personal Access Token)

Generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes.

##### Step 2: Project setup

Add the repo to **both** your pom.xml's `<repositories>` and `<distributionManagement>` sections

```xml
<repository>
<id>azure-sdk-for-java</id>
<url>https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
```

Add or edit the `settings.xml` file in `${user.home}/.m2`

```xml
<server>
<id>azure-sdk-for-java</id>
<username>azure-sdk</username>
<password>[PERSONAL_ACCESS_TOKEN]</password>
</server>
```

Replace `[PERSONAL_ACCESS_TOKEN]` in the `<password>` tag with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token)

##### Step 3: Add project dependency

Add to your project's pom.xml file

[//]: # ({x-version-update-start;com.azure:azure-ai-openai-realtime;current})
```xml
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-openai-realtime</artifactId>
<version>1.0.0-beta.1</version>
</dependency>
```
[//]: # ({x-version-update-end})

Then run:

```commandline
mvn install
```

#### Gradle setup

##### Step 1: get a PAT (Identical to the step for Maven setup)

generate a [Personal Access Token](https://dev.azure.com/azure-sdk/_details/security/tokens) with *Packaging* read & write scopes.

##### Step 2: Project setup

Add this section to your `build.gradle` file in **both** the `repositories` and `publishing.repositories` containers.

```groovy
maven {
url 'https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-java/maven/v1'
name 'azure-sdk-for-java'
credentials(PasswordCredentials)
authentication {
basic(BasicAuthentication)
}
}
```

Add or edit the `gradle.properties` file in `${user.home}/.gradle`

```groovy
azure-sdk-for-javaUsername=azure-sdk
azure-sdk-for-javaPassword=PERSONAL_ACCESS_TOKEN
```

Replace `PERSONAL_ACCESS_TOKEN` being assign to `azure-sdk-for-javaPassword` with the PAT you generated in [step 1.](#step-1-get-a-pat-personal-access-token)

##### Step 3: Add project dependency

Add to your project setup

```groovy
compile(group: 'com.azure', name: 'azure-ai-openai-realtime', version: '1.0.0-beta.1')
```
Then run:

```commandline
gradle build
```

### Authentication

In order to interact with the Azure OpenAI Service you'll need to create an instance of client class,
[RealtimeAsyncClient][realtime_client_async] or [RealtimeClient][realtime_client_sync] by using
[RealtimeClientBuilder][realtime_client_builder]. To configure a client for use with
Azure OpenAI, provide a valid endpoint URI to an Azure OpenAI resource along with a corresponding key credential and
token credential.

#### Example: Azure OpenAI

Get an Azure OpenAI `key` credential form the Azure Portal.

```java readme-sample-createSyncAzureClientKeyCredential
RealtimeClient client = new RealtimeClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
```

Alternatively, to build an async client:

```java readme-sample-createAsyncAzureClientKeyCredential
RealtimeAsyncClient client = new RealtimeClientBuilder()
.credential(new KeyCredential("{key}"))
.endpoint("{endpoint}")
.buildAsyncClient();
```

#### Example: non-Azure OpenAI

If we omit the `endpoint` parameter, the client built will assume we are operating agains the non-Azure OpenAI server

```java readme-sample-createSyncNonAzureClientKeyCredential
RealtimeClient client = new RealtimeClientBuilder()
.credential(new KeyCredential("{key}"))
.buildClient();
```

Alternatively, to build an async client:

```java readme-sample-createAsyncNonAzureClientKeyCredential
RealtimeAsyncClient client = new RealtimeClientBuilder()
.credential(new KeyCredential("{key}"))
.buildAsyncClient();
```

## Key concepts

For a more detailed guide please refer to the [Azure OpenAI realtime][aoai_samples_readme] general API guide.

- A caller establishes a connection to `/realtime`, which starts a new `session`
- The `session` can be configured to customize input and output audio behavior, voice activity detection behavior, and other shared settings
- A `session` automatically creates a default `conversation`
- Note: in the future, multiple concurrent conversations may be supported -- this is not currently available
- The `conversation` accumulates input signals until a `response` is started, either via a direct command by the caller or automatically by voice-activity-based turn detection
- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information
- Message `item`s have `content_part`s, allowing multiple modalities (text, audio) to be represented across a single item
- The `session` manages configuration of caller input handling (e.g. user audio) and common output/generation handling
- Each caller-initiated `response.create` can override some of the output `response` behavior, if desired
- Server-created `item`s and the `content_part`s in messages can be populated asynchronously and in parallel, e.g. receiving audio, text, and function information concurrently (round-robin)

## Examples

We can setup the Realtime session to return both text and audio.
```java readme-sample-sessionUpdate
client.sendMessage(new SessionUpdateEvent(
new RealtimeRequestSession()
.setVoice(RealtimeVoice.ALLOY)
.setTurnDetection(
new RealtimeServerVadTurnDetection()
.setThreshold(0.5)
.setPrefixPaddingMs(300)
.setSilenceDurationMs(200)
).setInputAudioTranscription(new RealtimeAudioInputTranscriptionSettings(
RealtimeAudioInputTranscriptionModel.WHISPER_1)
).setModalities(Arrays.asList(RealtimeRequestSessionModality.AUDIO, RealtimeRequestSessionModality.TEXT))
));
```

With the Azure OpenAI Realtime Audio client library, one can provide a prompt as an audio file.

```java readme-sample-uploadAudioFile
RealtimeClient client = new RealtimeClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();

String audioFilePath = "{path to audio file}";
byte[] audioBytes = Files.readAllBytes(Paths.get(audioFilePath));

client.addOnResponseDoneEventHandler(event -> {
System.out.println("Response done");
});

client.start();
client.sendMessage(new InputAudioBufferAppendEvent(audioBytes));
```

To consume the text and audio produced by the server we setup the following callbacks in an async scenario.

```java readme-sample-consumeSpecificEventsAsync
RealtimeAsyncClient client = new RealtimeClientBuilder()
.credential(new KeyCredential("{key}"))
.buildAsyncClient();

Disposable.Composite disposables = Disposables.composite();

disposables.addAll(Arrays.asList(
client.getServerEvents()
.takeUntil(serverEvent -> serverEvent instanceof ResponseAudioDoneEvent)
.ofType(ResponseAudioDeltaEvent.class)
.subscribe(this::consumeAudioDelta, this::consumeError, this::onAudioResponseCompleted),
client.getServerEvents()
.takeUntil(serverEvent -> serverEvent instanceof ResponseAudioTranscriptDoneEvent)
.ofType(ResponseAudioTranscriptDeltaEvent.class)
.subscribe(this::consumeAudioTranscriptDelta, this::consumeError, this::onAudioResponseTranscriptCompleted)
));
```

## Troubleshooting

### Enable client logging
You can set the `AZURE_LOG_LEVEL` environment variable to view logging statements made in the client library. For
example, setting `AZURE_LOG_LEVEL=2` would show all informational, warning, and error log messages. The log levels can
be found here: [log levels][log_levels].

### Default HTTP Client
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure
the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the
[HTTP clients wiki](https://learn.microsoft.com/azure/developer/java/sdk/http-client-pipeline#http-clients).

### Default SSL library
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL
operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides
better performance compared to the default SSL implementation within the JDK. For more information, including how to
reduce the dependency size, refer to the [performance tuning][performance_tuning] section of the wiki.

## Next steps

- Samples are explained in detail [here][samples_readme].

## Contributing

For details on contributing to this repository, see the [contributing guide](https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md).

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request

<!-- LINKS -->
[aoai_samples_readme]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md
[aoai_samples_readme_api_concepts]: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/README.md#api-concepts
[azure_subscription]: https://azure.microsoft.com/free/
[azure_openai_access]: https://learn.microsoft.com/azure/cognitive-services/openai/overview#how-do-i-get-access-to-azure-openai
[jdk]: https://docs.microsoft.com/java/azure/jdk/
[dev_feed_instructions]: https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#dev-feed
[log_levels]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/core/azure-core/src/main/java/com/azure/core/util/logging/ClientLogger.java
[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning
[samples_readme]: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai-realtime/src/samples
[quickstart]: https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-quickstart
[realtime_client_async]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeAsyncClient.java
[realtime_client_sync]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClient.java
[realtime_client_builder]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/main/java/com/azure/ai/openai/realtime/RealtimeClientBuilder.java
13 changes: 13 additions & 0 deletions sdk/openai/azure-ai-openai-realtime/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Troubleshooting OpenAI issues

This troubleshooting guide covers failure investigation techniques, common errors for the credential types in the Azure
OpenAI Realtime Java client library, and mitigation steps to resolve these errors. The common best practice sample can be found
in [Best Practice Samples][best_practice_samples].

## Get additional help

Additional information on ways to reach out for support can be found in the [SUPPORT.md][support] at the root of the repo.

<!-- Links -->
[best_practice_samples]: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/openai/azure-ai-openai-realtime/src/samples/README.md
[support]: https://github.com/Azure/azure-sdk-for-java/blob/main/SUPPORT.md
Loading

0 comments on commit 832f06c

Please sign in to comment.