Error When Accessing Azure Serverless Models #4771

MasterOfWebM · 2024-11-20T23:46:42Z

MasterOfWebM
Nov 20, 2024

What happened?

When attempting to utilize a serverless model in Azure AI Foundry (Azure OpenAI), I am receiving a 401 error, even though the the API key is verified and correct.

Config:

endpoints:
  azureOpenAI:
    titleModel: 'Meta-Llama-3.1-8B-Instruct'
    plugins: false
    assistants: false
    groups:
    - group: 'Llama'
      serverless: true
      apiKey: '<REDACTED>'
      baseURL: 'https://<REDACTED>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview'
      models:
        Meta-Llama-3.1-8B-Instruct: true

I tried the baseURL with and without the query parameter without any luck.

Steps to Reproduce

Create a new serverless model in Azure AI Foundry
Copy the Target URI in Foundry and use it as the baseURL
Attempt to query the API endpoint within LibreChat

What browsers are you seeing the problem on?

No response

Relevant log output

error: [handleAbortError] AI response error; aborting request: Failed to send message. HTTP 401 - { "statusCode": 401, "message": "Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired." }

Screenshots

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Answered by danny-avila

Nov 25, 2024

I'm updating the docs, but the update is now live:

Here is an example configuration for Meta-Llama-3.1-8B-Instruct:

endpoints:
  azureOpenAI:
    groups:
    - group: "serverless-example"
      apiKey: "${LLAMA318B_API_KEY}"
      baseURL: "https://example.services.ai.azure.com/models/"
      version: "2024-05-01-preview" # Optional: specify API version
      serverless: true
      models:
        # Must match the deployment name of the model
        Meta-Llama-3.1-8B-Instruct: true

Notes:

Azure AI Foundry models now provision endpoints under /models/chat/completions?api-version=version for serverless inference.
- The baseURL field should be set to the root of the endpoint, without anythi…

View full answer

danny-avila · 2024-11-25T16:28:24Z

danny-avila
Nov 25, 2024
Maintainer

I'm having trouble with Azure serverless models even with raw (cURL) API requests, including all the code examples that Azure gives

0 replies

danny-avila · 2024-11-25T16:38:10Z

danny-avila
Nov 25, 2024
Maintainer

Figured it out after running a couple of different tests. All their documentations say to use Auth Bearer for serverless inference requests (source):

POST /chat/completions?api-version=2024-04-01-preview
Authorization: Bearer <bearer-token>
Content-Type: application/json

however, we need to use api-key header instead:

I will make some changes to account for this, as well as supporting models/chat/completions as opposed to legacy /v1/chat/completions

2 replies

MasterOfWebM Nov 25, 2024
Author

Much appreciated!

danny-avila Nov 25, 2024
Maintainer

I'm updating the docs, but the update is now live:

Here is an example configuration for Meta-Llama-3.1-8B-Instruct:

endpoints:
  azureOpenAI:
    groups:
    - group: "serverless-example"
      apiKey: "${LLAMA318B_API_KEY}"
      baseURL: "https://example.services.ai.azure.com/models/"
      version: "2024-05-01-preview" # Optional: specify API version
      serverless: true
      models:
        # Must match the deployment name of the model
        Meta-Llama-3.1-8B-Instruct: true

Notes:

Azure AI Foundry models now provision endpoints under /models/chat/completions?api-version=version for serverless inference.
- The baseURL field should be set to the root of the endpoint, without anything after /models/, i.e., the /chat/completions path.
- Example: https://example.services.ai.azure.com/models/ for https://example.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
- The version query parameter is optional and can be specified in the baseURL field.
The model name used in the models field must match the deployment name of the model in the Azure AI Foundry.
Compatibility with LibreChat relies on parity with OpenAI API specs, which at the time of writing, are typically "Pay-as-you-go" or "Models as a Service" (MaaS) deployments on Azure AI Studio, that are OpenAI-SDK-compatible with either v1/completions or models/chat/completions endpoint handling.
All models that offer serverless deployments ("Serverless APIs") are compatible from the Azure model catalog. You can filter by "Serverless API" under Deployment options and "Chat completion" under inference tasks to see the full list; however, real time endpoint models have not been tested.
These serverless inference endpoint/models may or may not support function calling according to OpenAI API specs, which enables their use with Agents.
If using legacy "/v1/completions" (without "chat"), you need to set the forcePrompt field to true in your group config.

Answer selected by MasterOfWebM

kheimanhso · 2025-02-04T15:04:47Z

kheimanhso
Feb 4, 2025

For the record, Microsoft has 'fixed' the inference endpoint so that it works as documented as a Bearer instead of api-key header. Verified in postman that Bearer works and an api-key header does not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error When Accessing Azure Serverless Models #4771

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Error When Accessing Azure Serverless Models #4771

MasterOfWebM Nov 20, 2024

What happened?

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 3 comments · 2 replies

danny-avila Nov 25, 2024 Maintainer

danny-avila Nov 25, 2024 Maintainer

MasterOfWebM Nov 25, 2024 Author

danny-avila Nov 25, 2024 Maintainer

kheimanhso Feb 4, 2025

MasterOfWebM
Nov 20, 2024

Replies: 3 comments 2 replies

danny-avila
Nov 25, 2024
Maintainer

danny-avila
Nov 25, 2024
Maintainer

MasterOfWebM Nov 25, 2024
Author

danny-avila Nov 25, 2024
Maintainer

kheimanhso
Feb 4, 2025