Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on how to run HELM benchmarks against LLM models deployed on SageMaker Endpoints #1713

Closed
sermolin opened this issue Jul 6, 2023 · 5 comments
Labels
framework models p2 Priority 2 (Good to have for release) user question

Comments

@sermolin
Copy link

sermolin commented Jul 6, 2023

Can the authors provide a prescriptive guidance on how to run HELM benchmarks against LLMs deployed on SageMaker? There seem to be two obvious approaches:
a\front SageMaker Endpoint with an AWS API Gateway and provide and API_Key. What's not clear is where/how to provide API_URL. This approach is more scalable as it allows a SageMaker Endpoint to be invoked externally outside of AWS Account where the endpoint was deployed.
b\clone git repo into a machine with AWS Account credentials and hack the code to directly invoke SageMaker Endpoint instead of going the API_Key route.

Any suggestions/experience with either one of the approaches?

@yifanmai
Copy link
Collaborator

yifanmai commented Jul 6, 2023

AWS SageMaker support is something we definitely want to have in HELM eventually.

(a) and (b) are both reasonable approaches. I would suggest doing a fork and doing (b) if you have an urgent use case. For (a), you're welcome to open a PR to contribute a more general way of doing this.

I imagine that it would work similarly to how the HuggingFace Hub integration works with the --enable-huggingface-models flag where we specify an API endpoint URL through a flag passed to helm-run. The trickiness is that you'd have to also specify the model name and model length using this flag (which can't be auto-inferred from the URL, as far as I can tell).

I also think it would be good to move these things to a config file instead of flags eventually, but we don't have plans for that yet either.

@yifanmai yifanmai added user question p2 Priority 2 (Good to have for release) models framework labels Jul 6, 2023
@sermolin
Copy link
Author

sermolin commented Jul 17, 2023

I decided to follow option (b) as it is a more scalable approach. I deployed an AI21 Jurassic-J2-Jumbo model on a SageMaker endpoint and fronted it with an API Gateway + Lambda. API_URL and API_KEY can be used to invoke the model, eg:
curl -X POST -H "Content-Type: application/json" -H "X-API-Key: test" -d '{"prompt": "Who was Albert Einstein?"}' https://mkyfzfz3si.execute-api.us-west-2.amazonaws.com/completion

  • What is the format to use for API_KEY in credentials.conf?
  • How should I provide API_URL?

At least at the 1st pass of integration, we aren't going to make all models in SageMaker Jumpstart available with public APIs (it would be too costly to host).
I would rather a user create an LLM endpoint and front it with an API Gateway providing API_URL and (optional) API_KEY.
Then the user would specify these two parameters in a HELM config file (or command line params) and run applicable benchmarks. The assumption is that the user knows LLM's capabilities and would thus select an appropriate HELM benchmark to run.

@sermolin
Copy link
Author

Amazon API Gateway URL is be different for every new deployed API. Specifically, the alphanumeric string following https:// (eg. https://mkyfzfz3si) as well as geo region (eg us-west-2). Each user would deploy a new API Gateway. It would not be possible to infer the URL from a model name.
Therefore, if we follow the CLI flag proposal similar to --enable-huggingface-models, we would need to specify the API URL in the command line which would make it too long. Any thoughts of having a CLI flag (eg. --enable-amazon-sagemaker) which would then read the content of sagemaker.conf file? SageMaker API_URL and API_KEY would be stored in the sagemaker.conf.

@yifanmai
Copy link
Collaborator

The conf file option sounds reasonable to me:

[
  {
    "name": "mymodel",
    "url": "https://mymodelurl/",
    "api_key": "my_key"
  },
  {
    "name": "mymodel2",
    "url": "https://mymodelurl2/",
    "api_key": "my_key"
  }
]

@yifanmai
Copy link
Collaborator

yifanmai commented Aug 5, 2024

We now support Titan on Bedrock using BedrockClient: see #2165.

For other model providers on Bedrock, we should use their first party Python library integrations:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework models p2 Priority 2 (Good to have for release) user question
Projects
None yet
Development

No branches or pull requests

2 participants