Skip to content

Latest commit

 

History

History
125 lines (100 loc) · 4.62 KB

File metadata and controls

125 lines (100 loc) · 4.62 KB
page_type languages products name description
sample
python
azure
azure-cognitive-search
Custom embedding skill for Azure AI Search
The custom skill generates vector embeddings for provided content with the [HuggingFace all-MiniLM-L6-v2 model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).

HuggingFace Embeddings Generator

This custom skill enables generation of vector embeddings for text content which might be created/ingested as part of the Azure AI Search pipeline, utilizing the HuggingFace all-MiniLM-L6-v2 model. This model returns embeddings with 384 dimensions. This endpoint can also be used as a custom query vectorizer for data ingested with this model. An example notebook of how to use this endpoint end to end can be found at Azure AI Search Custom Vectorization Sample.

If you need your data to be chunked before being embedded by this custom skill, consider using the built in SplitSkill. If you are interested in generating embeddings using the Azure OpenAI service, please see the built in AzureOpenAIEmbeddingSkill.

Testing the functionality locally

The code in this skill can be tested locally before deploying to an Azure function. Setup the required parameters inside a local.settings.json (to be added, sample below) and follow the instructions in the Azure functions guide to test this capability locally.

The packages/references required for the code to be functional if running locally are listed in requirements.txt in this directory. Be sure that you are using Python 3.9 as your runtime stack.

Sample local.settings.json

Add a new file named local.settings.json inside this skill's working directory with the following contents:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing"
  }
}

Deploying the code as an Azure function

This code can be manually deployed to an Azure function app. Clone the repo locally and follow the Azure functions guide to deploy the function. Use Python 3.9 when selecting the runtime stack for the app.

embed

Sample Input:

{
    "values": [
        {
            "recordId": "1234",
            "data": {
                "text": "This is a test document."
            }
        }
    ]
}

Sample Output:

{
    "values": [
        {
            "recordId": "1234",
            "data": {
                "vector": [
                    -0.03833850100636482,
                    0.1234646588563919,
                    -0.028642958030104637,
                    . . . 
                ]
            },
            "errors": null,
            "warnings": null
        }
    ]
}

Sample Skillset Integration

In order to use this skill in a AI search pipeline, you'll need to add a skill definition to your skillset. Here's a sample skill definition for this example (inputs and outputs should be updated to reflect your particular scenario and skillset environment):

{
    "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
    "description": "Custom embedding generator",
    "uri": "[AzureFunctionEndpointUrl]/api/embed?code=[AzureFunctionDefaultHostKey]",
    "context": "/document/content",
    "inputs": [
        {
            "name": "text",
            "source": "/document/content"
        }
    ],
    "outputs": [
        {
            "name": "vector",
            "targetName": "vector"
        }
    ]
}

Sample index vectorizer Integration

In order to use this endpoint as a custom web API vectorizer, you'll need to add a vectorizer definition to your index. Here's a sample vectorizer definition you can use:

"vectorizers": [
    {
        "name": "my-custom-web-api-vectorizer",
        "kind": "customWebApi",
        "customWebApiParameters": {
            "uri": "[AzureFunctionEndpointUrl]/api/embed?code=[AzureFunctionDefaultHostKey]",
        },
    }
]