Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🖼️ feat(DALL-E): Azure OpenAI Support & New Config Variables #1586

Merged
merged 4 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -143,11 +143,22 @@ AZURE_AI_SEARCH_SEARCH_OPTION_QUERY_TYPE=
AZURE_AI_SEARCH_SEARCH_OPTION_TOP=
AZURE_AI_SEARCH_SEARCH_OPTION_SELECT=

# DALL·E 3
# DALL·E
#----------------
# DALLE_API_KEY=
# DALLE3_SYSTEM_PROMPT="Your System Prompt here"
# DALLE_REVERSE_PROXY=
# DALLE_API_KEY= # Key for both DALL-E-2 and DALL-E-3
# DALLE3_API_KEY= # Key for DALL-E-3 only
# DALLE2_API_KEY= # Key for DALL-E-2 only
# DALLE3_SYSTEM_PROMPT="Your DALL-E-3 System Prompt here"
# DALLE2_SYSTEM_PROMPT="Your DALL-E-2 System Prompt here"
# DALLE_REVERSE_PROXY= # Reverse proxy for DALL-E-2 and DALL-E-3
# DALLE3_BASEURL= # Base URL for DALL-E-3
# DALLE2_BASEURL= # Base URL for DALL-E-2

# DALL·E (via Azure OpenAI)
# Note: requires some of the variables above to be set
#----------------
# DALLE3_AZURE_API_VERSION= # Azure OpenAI API version for DALL-E-3
# DALLE2_AZURE_API_VERSION= # Azure OpenAI API versiion for DALL-E-2

# Google
#-----------------
Expand Down
87 changes: 47 additions & 40 deletions api/app/clients/tools/DALL-E.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
// From https://platform.openai.com/docs/api-reference/images/create
// To use this tool, you must pass in a configured OpenAIApi object.
const path = require('path');
const OpenAI = require('openai');
// const { genAzureEndpoint } = require('~/utils/genAzureEndpoints');
const { v4: uuidv4 } = require('uuid');
const { Tool } = require('langchain/tools');
const { HttpsProxyAgent } = require('https-proxy-agent');
Expand All @@ -10,46 +8,39 @@ const { processFileURL } = require('~/server/services/Files/process');
const extractBaseURL = require('~/utils/extractBaseURL');
const { logger } = require('~/config');

const { DALLE_REVERSE_PROXY, PROXY } = process.env;
const {
DALLE2_SYSTEM_PROMPT,
DALLE_REVERSE_PROXY,
PROXY,
DALLE2_AZURE_API_VERSION,
DALLE2_BASEURL,
DALLE2_API_KEY,
DALLE_API_KEY,
} = process.env;
class OpenAICreateImage extends Tool {
constructor(fields = {}) {
super();

this.userId = fields.userId;
this.fileStrategy = fields.fileStrategy;
let apiKey = fields.DALLE_API_KEY || this.getApiKey();
let apiKey = fields.DALLE2_API_KEY ?? fields.DALLE_API_KEY ?? this.getApiKey();

const config = { apiKey };
if (DALLE_REVERSE_PROXY) {
config.baseURL = extractBaseURL(DALLE_REVERSE_PROXY);
}

if (DALLE2_AZURE_API_VERSION && DALLE2_BASEURL) {
config.baseURL = DALLE2_BASEURL;
config.defaultQuery = { 'api-version': DALLE2_AZURE_API_VERSION };
config.defaultHeaders = { 'api-key': DALLE2_API_KEY, 'Content-Type': 'application/json' };
config.apiKey = DALLE2_API_KEY;
}

if (PROXY) {
config.httpAgent = new HttpsProxyAgent(PROXY);
}
// let azureKey = fields.AZURE_API_KEY || process.env.AZURE_API_KEY;

// if (azureKey) {
// apiKey = azureKey;
// const azureConfig = {
// apiKey,
// azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_API_INSTANCE_NAME || fields.azureOpenAIApiInstanceName,
// azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_API_DEPLOYMENT_NAME || fields.azureOpenAIApiDeploymentName,
// azureOpenAIApiVersion: process.env.AZURE_OPENAI_API_VERSION || fields.azureOpenAIApiVersion
// };
// config = {
// apiKey,
// basePath: genAzureEndpoint({
// ...azureConfig,
// }),
// baseOptions: {
// headers: { 'api-key': apiKey },
// params: {
// 'api-version': azureConfig.azureOpenAIApiVersion // this might change. I got the current value from the sample code at https://oai.azure.com/portal/chat
// }
// }
// };
// }

this.openai = new OpenAI(config);
this.name = 'dall-e';
this.description = `You can generate images with 'dall-e'. This tool is exclusively for visual content.
Expand All @@ -59,10 +50,24 @@ Guidelines:
- It's best to follow this format for image creation. Come up with the optional inputs yourself if none are given:
"Subject: [subject], Style: [style], Color: [color], Details: [details], Emotion: [emotion]"
- Generate images only once per human query unless explicitly requested by the user`;
this.description_for_model =
DALLE2_SYSTEM_PROMPT ??
`// Whenever a description of an image is given, generate prompts (following these rules), and use dalle to create the image. If the user does not ask for a specific number of images, default to creating 2 prompts to send to dalle that are written to be as diverse as possible. All prompts sent to dalle must abide by the following policies:
// 1. Prompts must be in English. Translate to English if needed.
// 2. One image per function call. Create only 1 image per request unless explicitly told to generate more than 1 image.
// 3. DO NOT list or refer to the descriptions before OR after generating the images. They should ONLY ever be written out ONCE, in the \`"prompts"\` field of the request. You do not need to ask for permission to generate, just do it!
// 4. Always mention the image type (photo, oil painting, watercolor painting, illustration, cartoon, drawing, vector, render, etc.) at the beginning of the caption. Unless the captions suggests otherwise, make one of the images a photo.
// 5. Diversify depictions of ALL images with people to always include always DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
// - EXPLICITLY specify these attributes, not abstractly reference them. The attributes should be specified in a minimal way and should directly describe their physical form.
// - Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
// - Use "various" or "diverse" ONLY IF the description refers to groups of more than 3 people. Do not change the number of people requested in the original description.
// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
// The prompt must intricately describe every part of the image in concrete, objective detail. THINK about what the end goal of the description is, and extrapolate that to what would make satisfying images.
// All descriptions sent to dalle should be a paragraph of text that is extremely descriptive and detailed. Each should be more than 3 sentences long.`;
}

getApiKey() {
const apiKey = process.env.DALLE_API_KEY || '';
const apiKey = DALLE2_API_KEY ?? DALLE_API_KEY ?? '';
if (!apiKey) {
throw new Error('Missing DALLE_API_KEY environment variable.');
}
Expand Down Expand Up @@ -96,17 +101,19 @@ Guidelines:
}

const imageBasename = getImageBasename(theImageUrl);
let imageName = `image_${uuidv4()}.png`;

if (imageBasename) {
imageName = imageBasename;
logger.debug('[DALL-E]', { imageName }); // Output: img-lgCf7ppcbhqQrz6a5ear6FOb.png
} else {
logger.debug('[DALL-E] No image name found in the string.', {
theImageUrl,
data: resp.data[0],
});
}
const imageExt = path.extname(imageBasename);

const extension = imageExt.startsWith('.') ? imageExt.slice(1) : imageExt;
const imageName = `img-${uuidv4()}.${extension}`;

logger.debug('[DALL-E-2]', {
imageName,
imageBasename,
imageExt,
extension,
theImageUrl,
data: resp.data[0],
});

try {
const result = await processFileURL({
Expand Down
4 changes: 2 additions & 2 deletions api/app/clients/tools/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@
"icon": "https://i.imgur.com/u2TzXzH.png",
"authConfig": [
{
"authField": "DALLE_API_KEY",
"authField": "DALLE2_API_KEY",
"label": "OpenAI API Key",
"description": "You can use DALL-E with your API Key from OpenAI."
}
Expand All @@ -102,7 +102,7 @@
"icon": "https://i.imgur.com/u2TzXzH.png",
"authConfig": [
{
"authField": "DALLE_API_KEY",
"authField": "DALLE3_API_KEY",
"label": "OpenAI API Key",
"description": "You can use DALL-E with your API Key from OpenAI."
}
Expand Down
50 changes: 33 additions & 17 deletions api/app/clients/tools/structured/DALLE3.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
// From https://platform.openai.com/docs/guides/images/usage?context=node
// To use this tool, you must pass in a configured OpenAIApi object.
const { z } = require('zod');
const path = require('path');
const OpenAI = require('openai');
const { v4: uuidv4 } = require('uuid');
const { Tool } = require('langchain/tools');
Expand All @@ -10,19 +9,33 @@ const { processFileURL } = require('~/server/services/Files/process');
const extractBaseURL = require('~/utils/extractBaseURL');
const { logger } = require('~/config');

const { DALLE3_SYSTEM_PROMPT, DALLE_REVERSE_PROXY, PROXY } = process.env;
const {
DALLE3_SYSTEM_PROMPT,
DALLE_REVERSE_PROXY,
PROXY,
DALLE3_AZURE_API_VERSION,
DALLE3_BASEURL,
DALLE3_API_KEY,
} = process.env;
class DALLE3 extends Tool {
constructor(fields = {}) {
super();

this.userId = fields.userId;
this.fileStrategy = fields.fileStrategy;
let apiKey = fields.DALLE_API_KEY || this.getApiKey();
let apiKey = fields.DALLE3_API_KEY ?? fields.DALLE_API_KEY ?? this.getApiKey();
const config = { apiKey };
if (DALLE_REVERSE_PROXY) {
config.baseURL = extractBaseURL(DALLE_REVERSE_PROXY);
}

if (DALLE3_AZURE_API_VERSION && DALLE3_BASEURL) {
config.baseURL = DALLE3_BASEURL;
config.defaultQuery = { 'api-version': DALLE3_AZURE_API_VERSION };
config.defaultHeaders = { 'api-key': DALLE3_API_KEY, 'Content-Type': 'application/json' };
config.apiKey = DALLE3_API_KEY;
}

if (PROXY) {
config.httpAgent = new HttpsProxyAgent(PROXY);
}
Expand All @@ -46,7 +59,8 @@ class DALLE3 extends Tool {
// - Use "various" or "diverse" ONLY IF the description refers to groups of more than 3 people. Do not change the number of people requested in the original description.
// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
// The prompt must intricately describe every part of the image in concrete, objective detail. THINK about what the end goal of the description is, and extrapolate that to what would make satisfying images.
// All descriptions sent to dalle should be a paragraph of text that is extremely descriptive and detailed. Each should be more than 3 sentences long.`;
// All descriptions sent to dalle should be a paragraph of text that is extremely descriptive and detailed. Each should be more than 3 sentences long.
// - The "vivid" style is HIGHLY preferred, but "natural" is also supported.`;
this.schema = z.object({
prompt: z
.string()
Expand All @@ -71,7 +85,7 @@ class DALLE3 extends Tool {
}

getApiKey() {
const apiKey = process.env.DALLE_API_KEY || '';
const apiKey = process.env.DALLE3_API_KEY ?? process.env.DALLE_API_KEY ?? '';
if (!apiKey) {
throw new Error('Missing DALLE_API_KEY environment variable.');
}
Expand Down Expand Up @@ -121,17 +135,19 @@ Error Message: ${error.message}`;
}

const imageBasename = getImageBasename(theImageUrl);
let imageName = `image_${uuidv4()}.png`;

if (imageBasename) {
imageName = imageBasename;
logger.debug('[DALL-E-3]', { imageName }); // Output: img-lgCf7ppcbhqQrz6a5ear6FOb.png
} else {
logger.debug('[DALL-E-3] No image name found in the string.', {
theImageUrl,
data: resp.data[0],
});
}
const imageExt = path.extname(imageBasename);

const extension = imageExt.startsWith('.') ? imageExt.slice(1) : imageExt;
const imageName = `img-${uuidv4()}.${extension}`;

logger.debug('[DALL-E-3]', {
imageName,
imageBasename,
imageExt,
extension,
theImageUrl,
data: resp.data[0],
});

try {
const result = await processFileURL({
Expand Down
11 changes: 9 additions & 2 deletions api/app/clients/tools/structured/specs/DALLE3.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ jest.mock('path', () => {
resolve: jest.fn(),
join: jest.fn(),
relative: jest.fn(),
extname: jest.fn().mockImplementation((filename) => {
return filename.slice(filename.lastIndexOf('.'));
}),
};
});

Expand Down Expand Up @@ -148,7 +151,7 @@ describe('DALLE3', () => {
await expect(dalle._call(mockData)).rejects.toThrow('Missing required field: prompt');
});

it('should log to console if no image name is found in the URL', async () => {
it('should log appropriate debug values', async () => {
const mockData = {
prompt: 'A test prompt',
};
Expand All @@ -162,9 +165,13 @@ describe('DALLE3', () => {

generate.mockResolvedValue(mockResponse);
await dalle._call(mockData);
expect(logger.debug).toHaveBeenCalledWith('[DALL-E-3] No image name found in the string.', {
expect(logger.debug).toHaveBeenCalledWith('[DALL-E-3]', {
data: { url: 'http://example.com/invalid-url' },
theImageUrl: 'http://example.com/invalid-url',
extension: expect.any(String),
imageBasename: expect.any(String),
imageExt: expect.any(String),
imageName: expect.any(String),
});
});

Expand Down
5 changes: 5 additions & 0 deletions docs/general_info/breaking_changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ weight: -10
**If you experience any issues after updating, we recommend clearing your browser cache and cookies.**
Certain changes in the updates may impact cookies, leading to unexpected behaviors if not cleared properly.

## v0.6.6

- **DALL-E Update**: user-provided keys for DALL-E are now specific to each DALL-E version, i.e.: `DALLE3_API_KEY` and `DALLE2_API_KEY`
- Note: `DALLE_API_KEY` will work for both DALL-E-3 and DALL-E-2 when the admin provides the credential; in other words, this may only affect your users if DALLE_API_KEY is not set in the `.env` file. In this case, they will simply have to "uninstall" the plugin, and provide their API key again.

## v0.6.x

- **Meilisearch Update**: Following the recent update to Meilisearch, an unused folder named `meili_data` may be present in your root directory. This folder is no longer required and can be **safely deleted** to free up space.
Expand Down
42 changes: 42 additions & 0 deletions docs/install/configuration/ai_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,48 @@ As of December 18th, 2023, Vision models seem to have degraded performance with

> Note: a change will be developed to improve current configuration settings, to allow multiple deployments/model configurations setup with ease: **[#1390](https://github.com/danny-avila/LibreChat/issues/1390)**

### Generate images with Azure OpenAI Service (DALL-E)

| Model ID | Feature Availability | Max Request (characters) |
|----------|----------------------|-------------------------|
| dalle2 | East US | 1000 |
| dalle3 | Sweden Central | 4000 |

- First you need to create an Azure resource that hosts DALL-E
- At the time of writing, dall-e-3 is available in the `SwedenCentral` region, dall-e-2 in the `EastUS` region.
- Then, you need to deploy the image generation model in one of the above regions.
- Read the [Azure OpenAI Image Generation Quickstart Guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/dall-e-quickstart) for further assistance
- Configure your environment variables based on Azure credentials:

**- For DALL-E-3:**

```bash
DALLE3_AZURE_API_VERSION=the-api-version # e.g.: 2023-12-01-preview
DALLE3_BASEURL=https://<AZURE_OPENAI_API_INSTANCE_NAME>.openai.azure.com/openai/deployments/<DALLE3_DEPLOYMENT_NAME>/
DALLE3_API_KEY=your-azure-api-key-for-dall-e-3
```

**- For DALL-E-2:**

```bash
DALLE2_AZURE_API_VERSION=the-api-version # e.g.: 2023-12-01-preview
DALLE2_BASEURL=https://<AZURE_OPENAI_API_INSTANCE_NAME>.openai.azure.com/openai/deployments/<DALLE2_DEPLOYMENT_NAME>/
DALLE2_API_KEY=your-azure-api-key-for-dall-e-2
```

**DALL-E Notes:**

- For DALL-E-3, the default system prompt has the LLM prefer the ["vivid" style](https://platform.openai.com/docs/api-reference/images/create#images-create-style) parameter, which seems to be the preferred setting for ChatGPT as "natural" can sometimes produce lackluster results.
- See official prompt for reference: **[DALL-E System Prompt](https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/_system-prompts/dall-e.md)**
- You can adjust the system prompts to your liking:

```bash
DALLE3_SYSTEM_PROMPT="Your DALL-E-3 System Prompt here"
DALLE2_SYSTEM_PROMPT="Your DALL-E-2 System Prompt here"
```

- The `DALLE_REVERSE_PROXY` environment variable is ignored when Azure credentials (DALLEx_AZURE_API_VERSION and DALLEx_BASEURL) for DALL-E are configured.

### Optional Variables

*These variables are currently not used by LibreChat*
Expand Down
Loading