Skip to content

Commit

Permalink
[OpenAI] Add Whisper (Azure#27109)
Browse files Browse the repository at this point in the history
### Packages impacted by this PR
@azure/openai

### Issues associated with this PR
None for whisper but has a rudimentary fix for
Azure#26953

### Describe the problem that is addressed by this PR
Adds support for speech to text capabilities. See the changelog entry
and the samples for more details about the addition.

Few notes:
- Bring Your Own Data tests are skipped because the new version
deployment doesn't support it yet, hopefully the support should be there
soon
- @azure/core-rest-pipeline's `formDataPolicy` doesn't support file
uploads. I added a custom version of the policy in openai that supports
file uploads and uses an actively maintained 3rd party library.
- adds a fix for Azure#26953
that doesn't rely on core changes (see the changes in
`src/api/getSSE.ts` and `src/api/getSSE.browser.ts` files. A better fix
is in Azure#27000 but that is
still being reviewed.

### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?
N/A

### Are there test cases added in this PR? _(If not, why?)_
Yes

### Provide a list of related PRs _(if any)_
N/A

### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_

### Checklists
- [x] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [x] Added a changelog (if necessary)

---------

Co-authored-by: Minh-Anh Phan <[email protected]>
  • Loading branch information
deyaaeldeen and minhanh-phan authored Sep 19, 2023
1 parent 245548f commit 4e1b3ec
Show file tree
Hide file tree
Showing 47 changed files with 1,783 additions and 197 deletions.
38 changes: 31 additions & 7 deletions common/config/rush/pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 7 additions & 6 deletions sdk/openai/openai/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Release History

## 1.0.0-beta.6 (Unreleased)
## 1.0.0-beta.6 (2023-09-21)

### Features Added

### Breaking Changes
- Introduces speech to text and translation capabilities for a wide variety of audio file formats.
- Adds `getAudioTranscription` and `getAudioTranslation` methods for transcribing and translating audio files. The result can be either a simple JSON structure with just a `text` field or a more detailed JSON structure containing the text alongside additional information. In addition, VTT (Web Video Text Tracks), SRT (SubRip Text), and plain text formats are also supported. The type of the result depends on the `format` parameter if specified, otherwise, a simple JSON output is assumed. The methods could take as input an optional text prompt to guide the model's style or continue a previous audio segment. The language of the prompt should match that of the audio file.
- The available model at the time of this release supports the following list of audio file formats: m4a, mp3, wav, ogg, flac, webm, mp4, mpga, mpeg, and oga.

### Bugs Fixed

- Return `usage` information when available.
- Return `error` information in `ContentFilterResults` when available.

### Other Changes
- Returns `usage` information when available.
- Fixes a bug where errors weren't properly being thrown from the streaming methods.
- Returns `error` information in `ContentFilterResults` when available.

## 1.0.0-beta.5 (2023-08-25)

Expand Down
68 changes: 61 additions & 7 deletions sdk/openai/openai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ non-Azure OpenAI inference endpoint, making it a great choice for even non-Azure

Use the client library for Azure OpenAI to:

* [Create a completion for text][msdocs_openai_completion]
* [Create a chat completion with ChatGPT][msdocs_openai_chat_completion]
* [Create a completion for text][get_completions_sample]
* [Create a chat completion with ChatGPT][list_chat_completion_sample]
* [Create a text embedding for comparisons][msdocs_openai_embedding]
* [Use your own data with Azure OpenAI][msdocs_openai_custom_data]
* [Use your own data with Azure OpenAI][byod_sample]
* [Generate images][get_images_sample]
* [Transcribe and Translate audio files][transcribe_audio_sample]

Azure OpenAI is a managed service that allows developers to deploy, tune, and generate content from OpenAI models on Azure resources.

Expand All @@ -20,6 +22,7 @@ Checkout the following examples:
- [Summarize Text](#summarize-text-with-completion)
- [Generate Images](#generate-images-with-dall-e-image-generation-models)
- [Analyze Business Data](#analyze-business-data)
- [Transcribe and Translate audio files](#transcribe-and-translate-audio-files)

Key links:

Expand Down Expand Up @@ -140,6 +143,10 @@ async function main(){
console.log(choice.text);
}
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```

## Examples
Expand Down Expand Up @@ -179,6 +186,10 @@ async function main(){
}
}
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
### Generate Multiple Completions With Subscription Key
Expand Down Expand Up @@ -212,6 +223,10 @@ async function main(){
console.log(`Chatbot: ${completion}`);
}
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
### Summarize Text with Completion
Expand Down Expand Up @@ -254,6 +269,9 @@ async function main(){
console.log(`Summarization: ${completion}`);
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
### Generate images with DALL-E image generation models
Expand All @@ -276,6 +294,10 @@ async function main() {
console.log(`Image generation result URL: ${image.url}`);
}
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
### Analyze Business Data
Expand All @@ -285,7 +307,7 @@ This example generates chat responses to input chat questions about your busines
```javascript
const { OpenAIClient } = require("@azure/openai");
const { DefaultAzureCredential } = require("@azure/identity")
const { DefaultAzureCredential } = require("@azure/identity");

async function main(){
const endpoint = "https://myaccount.openai.azure.com/";
Expand Down Expand Up @@ -323,6 +345,36 @@ async function main(){
}
}
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
### Transcribe and translate audio files
The speech to text and translation capabilities of Azure OpenAI can be used to transcribe and translate a wide variety of audio file formats. The following example shows how to use the `getAudioTranscription` method to transcribe audio into the language the audio is in. You can also translate and transcribe the audio into English using the `getAudioTranslation` method.
The audio file can be loaded into memory using the NodeJS file system APIs. In the browser, the file can be loaded using the `FileReader` API and the output of `arrayBuffer` instance method can be passed to the `getAudioTranscription` method.
```js
const { OpenAIClient, AzureKeyCredential } = require("@azure/openai");
const fs = require("fs/promises");

async function main() {
console.log("== Transcribe Audio Sample ==");

const client = new OpenAIClient(endpoint, new AzureKeyCredential(azureApiKey));
const deploymentName = "whisper-deployment";
const audio = await fs.readFile("< path to an audio file >");
const result = await client.getAudioTranscription(deploymentName, audio);

console.log(`Transcription: ${result.text}`);
}

main().catch((err) => {
console.error("The sample encountered an error:", err);
});
```
## Troubleshooting
Expand All @@ -340,9 +392,11 @@ setLogLevel("info");
For more detailed instructions on how to enable logs, you can look at the [@azure/logger package docs](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/core/logger).
<!-- LINKS -->
[msdocs_openai_completion]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/completions.js
[msdocs_openai_chat_completion]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/listChatCompletions.js
[msdocs_openai_custom_data]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples-dev/bringYourOwnData.ts
[get_completions_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/completions.js
[list_chat_completion_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/listChatCompletions.js
[byod_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/bringYourOwnData.js
[get_images_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/getImages.js
[transcribe_audio_sample]: https://github.com/Azure/azure-sdk-for-js/tree/openai/add-whisper/sdk/openai/openai/samples-dev/audioTranscription.ts
[msdocs_openai_embedding]: https://learn.microsoft.com/azure/cognitive-services/openai/concepts/understand-embeddings
[azure_openai_completions_docs]: https://learn.microsoft.com/azure/cognitive-services/openai/how-to/completions
[defaultazurecredential]: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/identity/identity#defaultazurecredential
Expand Down
2 changes: 1 addition & 1 deletion sdk/openai/openai/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "js",
"TagPrefix": "js/openai/openai",
"Tag": "js/openai/openai_353545d522"
"Tag": "js/openai/openai_85d9317957"
}
Binary file added sdk/openai/openai/assets/audio/countdown.flac
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.m4a
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.mp3
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.mp4
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.mpeg
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.mpga
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.oga
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.ogg
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.wav
Binary file not shown.
Binary file added sdk/openai/openai/assets/audio/countdown.webm
Binary file not shown.
3 changes: 3 additions & 0 deletions sdk/openai/openai/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"module": "dist-esm/src/index.js",
"browser": {
"./dist-esm/src/api/getSSEs.js": "./dist-esm/src/api/getSSEs.browser.js",
"./dist-esm/src/api/policies/formDataPolicy.js": "./dist-esm/src/api/policies/formDataPolicy.browser.js",
"./dist-esm/test/public/utils/getImageDimensions.js": "./dist-esm/test/public/utils/getImageDimensions.browser.js"
},
"type": "module",
Expand Down Expand Up @@ -136,6 +137,8 @@
"@azure/core-lro": "^2.5.3",
"@azure/core-rest-pipeline": "^1.10.2",
"@azure/logger": "^1.0.3",
"formdata-node": "^4.0.0",
"form-data-encoder": "1.7.2",
"tslib": "^2.4.0"
},
"//sampleConfiguration": {
Expand Down
71 changes: 71 additions & 0 deletions sdk/openai/openai/review/openai.api.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,58 @@ import { KeyCredential } from '@azure/core-auth';
import { OperationOptions } from '@azure-rest/core-client';
import { TokenCredential } from '@azure/core-auth';

// @public
export type AudioResult<ResponseFormat extends AudioResultFormat> = {
json: AudioResultSimpleJson;
verbose_json: AudioResultVerboseJson;
vtt: string;
srt: string;
text: string;
}[ResponseFormat];

// @public
export type AudioResultFormat =
/** This format will return an JSON structure containing a single \"text\" with the transcription. */
"json"
/** This format will return an JSON structure containing an enriched structure with the transcription. */
| "verbose_json"
/** This will make the response return the transcription as plain/text. */
| "text"
/** The transcription will be provided in SRT format (SubRip Text) in the form of plain/text. */
| "srt"
/** The transcription will be provided in VTT format (Web Video Text Tracks) in the form of plain/text. */
| "vtt";

// @public
export interface AudioResultSimpleJson {
text: string;
}

// @public
export interface AudioResultVerboseJson extends AudioResultSimpleJson {
duration: number;
language: string;
segments: AudioSegment[];
task: AudioTranscriptionTask;
}

// @public
export interface AudioSegment {
avgLogprob: number;
compressionRatio: number;
end: number;
id: number;
noSpeechProb: number;
seek: number;
start: number;
temperature: number;
text: string;
tokens: number[];
}

// @public
export type AudioTranscriptionTask = string;

// @public
export interface AzureChatExtensionConfiguration {
parameters: Record<string, any>;
Expand Down Expand Up @@ -184,6 +236,21 @@ export interface FunctionName {
name: string;
}

// @public
export interface GetAudioTranscriptionOptions extends OperationOptions {
language?: string;
model?: string;
prompt?: string;
temperature?: number;
}

// @public
export interface GetAudioTranslationOptions extends OperationOptions {
model?: string;
prompt?: string;
temperature?: number;
}

// @public
export interface GetChatCompletionsOptions extends OperationOptions {
azureExtensionOptions?: AzureExtensionsOptions;
Expand Down Expand Up @@ -261,6 +328,10 @@ export class OpenAIClient {
constructor(endpoint: string, credential: KeyCredential, options?: OpenAIClientOptions);
constructor(endpoint: string, credential: TokenCredential, options?: OpenAIClientOptions);
constructor(openAiApiKey: KeyCredential, options?: OpenAIClientOptions);
getAudioTranscription(deploymentName: string, fileContent: Uint8Array, options?: GetAudioTranscriptionOptions): Promise<AudioResultSimpleJson>;
getAudioTranscription<Format extends AudioResultFormat>(deploymentName: string, fileContent: Uint8Array, format: Format, options?: GetAudioTranscriptionOptions): Promise<AudioResult<Format>>;
getAudioTranslation(deploymentName: string, fileContent: Uint8Array, options?: GetAudioTranslationOptions): Promise<AudioResultSimpleJson>;
getAudioTranslation<Format extends AudioResultFormat>(deploymentName: string, fileContent: Uint8Array, format: Format, options?: GetAudioTranslationOptions): Promise<AudioResult<Format>>;
getChatCompletions(deploymentName: string, messages: ChatMessage[], options?: GetChatCompletionsOptions): Promise<ChatCompletions>;
getCompletions(deploymentName: string, prompt: string[], options?: GetCompletionsOptions): Promise<Completions>;
getEmbeddings(deploymentName: string, input: string[], options?: GetEmbeddingsOptions): Promise<Embeddings>;
Expand Down
Loading

0 comments on commit 4e1b3ec

Please sign in to comment.