Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Connectors and ML updates for 2.9 #4554

Merged
merged 28 commits into from
Jul 19, 2023
Merged

Add Connectors and ML updates for 2.9 #4554

merged 28 commits into from
Jul 19, 2023

Conversation

Naarcha-AWS
Copy link
Collaborator

Fixes #3063

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@hdhalter hdhalter added v2.9.0 release-notes PR: Include this PR in the automated release notes labels Jul 13, 2023
@Naarcha-AWS Naarcha-AWS requested a review from Zhangxunmt July 18, 2023 16:37
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Jul 18, 2023
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for some comments.

_ml-commons-plugin/cluster-settings.md Outdated Show resolved Hide resolved
_ml-commons-plugin/cluster-settings.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/ml-dashboard.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
STYLE_GUIDE.md Outdated Show resolved Hide resolved
Signed-off-by: Naarcha-AWS <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS A few more changes. Let me know if I reviewed what you needed or if there's other content that needs my review.

_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
_ml-commons-plugin/connectors.md Outdated Show resolved Hide resolved
Naarcha-AWS and others added 5 commits July 19, 2023 15:48
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS merged commit 95d117f into main Jul 19, 2023
@Naarcha-AWS Naarcha-AWS deleted the ml-connectors branch July 19, 2023 23:35

### Adding trusted endpoints

To configure connectors in OpenSearch, add the trusted endpoints to your cluster settings using the `plugins.ml_commons.trusted_connector_endpoints_regex` setting, which supports Java regex expressions, as shown in the following example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this to settings page

"^https://runtime\\.sagemaker\\..*\\.amazonaws\\.com/.*$",
"^https://api\\.openai\\.com/.*$",
"^https://api\\.cohere\\.ai/.*$",
"^https://bedrock\\..*\\.amazonaws.com/.*$"
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?


### Enabling ML nodes

Most connectors require the use of dedicated ML nodes. To make sure you have ML nodes enabled, update the following cluster settings:
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most connectors require the use of dedicated ML nodes -> By default, connectors require the use of dedicated ML nodes. Actually this is the default setting: "plugins.ml_commons.only_run_on_ml_node": true

For remote connector, it consumes much less resource. So should be ok if user prefer to run on data node. If they don't have dedicate ML node, and prefer to run on data node, they can set

PUT /_cluster/settings
{
    "persistent": {
        "plugins.ml_commons.only_run_on_ml_node": false
    }
}

| `description` | String | A description of the connector. |
| `version` | Integer | The version of the connector. |
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`.
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter -> parameters

Type is : Map<String, ?>

including endpointandmodel. -> for example endpointandmodel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From security team , we should call out all parameters in this block will be overridable in predict request. User can provide parameter with same in predict request to override the default parameter value defined in connector.

| `version` | Integer | The version of the connector. |
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`.
| `credential` | String | Defines any credential variables required to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type: Map<string, string>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key.
->
ML Commons uses **AES/GCM/NoPadding** symmetric encryption to encrypt your credential. When a connection cluster first starts, ml-commons will create a random 32 bytes key and persist in OpenSearch system index. Therefore, you do not need to manually set the encryption key.

| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`.
| `credential` | String | Defines any credential variables required to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key.
| `action` | JSON array | Tells the connector what actions to run after a connection to ML Commons has been established.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tells the connector what actions to run after a connection to ML Commons has been established.
->
Define what actions can run within this connector.

`action_type` | String | Required. Sets the ML Commons API operation to use upon connection. As of OpenSearch 2.9, only `predict` is supported.
`method` | String | Required. Defines the HTTP method for the API call. Supports `POST` and `GET`.
`url` | String | Required. Sets the connection endpoint at which the action takes place. This must match the regex expression for the connection used when [adding trusted endpoints](#adding-trusted-endpoints).
`headers` | String | Sets the headers used inside the request or response body. Default is `application/json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type: Map<String, String>

Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is application/json
->
Default "Content-Type" is "application/json"


### Standalone connector

The connector creation API, `/_plugins/_ml/connectors/_create`, creates connections to third-party ML tools. Using the `endpoint` parameter, you can connect ML Commons to any supported ML tool using its specific API endpoint. For example, to connect to a ChatGPT completion model, you can connect using the `api.openai.com`, as shown in the following example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, to connect to a ChatGPT completion model,

->

For example, to connect to a ChatGPT chat model,

```
{% include copy-curl.html %}

If successful, the connector API responds with a `connector_id` and `status` for the connection:
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote this

and status

- You can use `model_group_id` to register a model version to an existing model group.
- If you do not use `model_group_id`, ML Commons creates a model with a new model group.

The following example registers a model named `openAI-GPT-3.5 completions`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following example registers a model named openAI-GPT-3.5 completions:

->

The following example registers a model named openAI-gpt-3.5-turbo:

{
"name": "openAI-gpt-3.5-turbo",
"function_name": "remote",
"model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest add the example for creating this model group

POST /_plugins/_ml/model_groups/_register
{
  "name": "remote_model_group",
  "description": "This is an example description"
}

Sample response
{
  "model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
  "status": "CREATED"
}

POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict
{
"parameters": {
"model": "gpt-3.5-turbo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this line, as model already defined in connector


```json
POST /_plugins/_ml/models/_register
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not chat model example. Let's keep consistent with the standalone connector.

}
```

After creating the connector, you can retrieve the `task_id`, deploy the model, and use the Predict API, similar to a standalone connector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After creating the connector, you can retrieve the task_id, deploy the model, and use the Predict API, similar to a standalone connector.

->

After creating the connector, you can use the connector id to register model, deploy and predict.

The `paramaters` section requires the following options when using `aws-sigv4` authentication:

- `region`: The AWS Region in which the AWS instance is located.
- `service_name`: The name of the AWS service for the connector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all items defined in parameters will be overridable and visible (user can see parameters with get connector API). If user don't want this, they can move these two reserved parameters for aws_sigv4 to credential.

- `secret_key`: Required. Provides the secret key for the AWS instance.
- `session_token`: Optional. Provides a temporary set of credentials for the AWS instance.

The `paramaters` section requires the following options when using `aws-sigv4` authentication:
Copy link
Contributor

@ylwu-amzn ylwu-amzn Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws-sigv4
->
aws_sigv4

Please also check all other places

@@ -0,0 +1,461 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some security warning for remote connector. As user need to configure credential in connector, for example AWS credential, openAI API key, user should always use security enabled cluster. Otherwise their credentials will not be protected which is risky.

@dylan-tong-aws
Copy link

@natebower any mention of "SageMaker" should be "Amazon SageMaker". I saw some instances of "AWS SageMaker" that needs to be corrected. The same for ChatGPT. I believe we our standard is to use the full name, "OpenAI ChatGPT".

@natebower
Copy link
Collaborator

@natebower any mention of "SageMaker" should be "Amazon SageMaker". I saw some instances of "AWS SageMaker" that needs to be corrected. The same for ChatGPT. I believe we our standard is to use the full name, "OpenAI ChatGPT".

@dylan-tong-aws Re: "Amazon SageMaker", that's correct, thanks for the callout. @Naarcha-AWS Can you please global find and replace? Re: ChatGPT, we don't actually have a standard for this, but given that OpenAI simply uses "ChatGPT", I'd prefer that we follow suit. Thanks!

@ylwu-amzn ylwu-amzn mentioned this pull request Jul 25, 2023
1 task
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Add Connectors and ML updates for 2.9

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix code block

Signed-off-by: Naarcha-AWS <[email protected]>

* Add Connectors and ML updates for 2.9

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix code block

Signed-off-by: Naarcha-AWS <[email protected]>

* Add connector settings and examples

Signed-off-by: Naarcha-AWS <[email protected]>

* Add GA warning

Signed-off-by: Naarcha-AWS <[email protected]>

* Add final experimental warning

Signed-off-by: Naarcha-AWS <[email protected]>

* Address tech review. Fix typos

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix bad link. Add next steps section

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix typo

Signed-off-by: Naarcha-AWS <[email protected]>

* Update cluster-settings.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _ml-commons-plugin/connectors.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Change cluster values for boolean. Fix typo.

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Fix cluser settings

Signed-off-by: Naarcha-AWS <[email protected]>

* Add missing config options. More technical feedback.

Signed-off-by: Naarcha-AWS <[email protected]>

* Adjust cluster setting description.

Signed-off-by: Naarcha-AWS <[email protected]>

* Add updated ChatGPT example

Signed-off-by: Naarcha-AWS <[email protected]>

* Add info and example for internal connector.

Signed-off-by: Naarcha-AWS <[email protected]>

* One last adjustment.

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Fix dead link

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix one last comment.

Signed-off-by: Naarcha-AWS <[email protected]>

* change ordered list to numbered.

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli pushed a commit that referenced this pull request Dec 21, 2023
* Add Connectors and ML updates for 2.9

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix code block

Signed-off-by: Naarcha-AWS <[email protected]>

* Add Connectors and ML updates for 2.9

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix code block

Signed-off-by: Naarcha-AWS <[email protected]>

* Add connector settings and examples

Signed-off-by: Naarcha-AWS <[email protected]>

* Add GA warning

Signed-off-by: Naarcha-AWS <[email protected]>

* Add final experimental warning

Signed-off-by: Naarcha-AWS <[email protected]>

* Address tech review. Fix typos

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix bad link. Add next steps section

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix typo

Signed-off-by: Naarcha-AWS <[email protected]>

* Update cluster-settings.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _ml-commons-plugin/connectors.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Change cluster values for boolean. Fix typo.

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Fix cluser settings

Signed-off-by: Naarcha-AWS <[email protected]>

* Add missing config options. More technical feedback.

Signed-off-by: Naarcha-AWS <[email protected]>

* Adjust cluster setting description.

Signed-off-by: Naarcha-AWS <[email protected]>

* Add updated ChatGPT example

Signed-off-by: Naarcha-AWS <[email protected]>

* Add info and example for internal connector.

Signed-off-by: Naarcha-AWS <[email protected]>

* One last adjustment.

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Fix dead link

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix one last comment.

Signed-off-by: Naarcha-AWS <[email protected]>

* change ordered list to numbered.

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Doc review PR: Doc review in progress release-notes PR: Include this PR in the automated release notes v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] ML Extensibility Phase1: ML Inference Endpoint Connectors
8 participants