RFC 431: SageMaker Model Hosting L2 Constructs #433

petermeansrock · 2022-05-04T18:45:46Z

Co-authored-by: Matt McClean [email protected]
Co-authored-by: Long Yao [email protected]
Co-authored-by: Drew Jetter [email protected]
Co-authored-by: Murali Ganesh [email protected]
Co-authored-by: Abilash Rangoju [email protected]

This is a request for comments about SageMaker model hosting L2 constructs. See #431 for
additional details.

APIs are signed off by @kaizencc.

Rendered Version

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license

Co-authored-by: Matt McClean <[email protected]> Co-authored-by: Long Yao <[email protected]> Co-authored-by: Drew Jetter <[email protected]> Co-authored-by: Murali Ganesh <[email protected]> Co-authored-by: Abilash Rangoju <[email protected]>

comcalvi

First quick pass, will take another look later.

comcalvi · 2022-05-11T17:20:20Z

text/0431-sagemaker-l2-endpoint.md

+An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
+five containers that process requests for inferences on data. You use an inference pipeline to


Why two to five containers? The docs indicate a maximum of 15 containers: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-model.html#cfn-sagemaker-model-containers

Good catch. As described in the drawback section, the initial draft of this RFC was based on a Q3 2019 snapshot of the SageMaker feature set; SageMaker has since increased the number of supported containers. As part of authoring the second revision of this RFC, I'll audit the SageMaker CloudFormation resources to bring the RFC in line with the current supported attributes/features.

comcalvi · 2022-05-11T17:32:57Z

text/0431-sagemaker-l2-endpoint.md

+  container: {
+    image: image1, modelData: modelData1
+  },
+  extraContainers: [


Why don't we make a single containers property? With one containers property, we can decide if we want it to be an inference model or a single-container model based on the number the user specifies (eg if it's 1 container, then do a single-container model, and if it's more then do an inference pipeline model).

I agree. An api with container and extraContainers seems clumsy.

As at least one container is required, following Adam's previous recommendation to introduce productionVariant + extraProductionVariants, my co-authors and I used the same naming convention for the containers.

Once there's consensus on this thread, I plan to apply the same naming convention to both the containers and production variants (i.e., I just want to be doubly sure that the CDK team wants me to undo Adam's recommendation).

rix0rrr

Haven't read everything yet, comments so far

rix0rrr · 2022-05-11T17:11:58Z

text/0431-sagemaker-l2-endpoint.md

+
+## Model
+
+By creating a model, you tell Amazon SageMaker where it can find the model components. This includes


I don't know yet what a model is, and did you meant to say "by creating a Model" ? As in, refer to a class here?

Most of the current verbiage in the README was taken from SageMaker's public documentation (link relevant to this section), which uses "model" to describe both the SageMaker resource and its associated abstract ML concept.

Which of the following are you looking for?

Use/paraphrase a one-liner from the AWS glossary, as in the following

In machine learning (ML), a mathematical model that generates predictions by finding patterns in data.

A more substantial write-up where I describe ML concepts to non-ML customers

Simply use Model in place of the first use of "model"?

[This applies to all documentation in this readme section]

The SageMaker public documentation is a good starting point, but I caution actually copying it over verbatim. We can always link to the documentation with something like "For more information on Amazon SageMaker, see SageMaker docs"

To answer this specific question, I'd like to see something like this:

## Model To create a machine learning model with Amazon Sagemaker, use the `Model` construct. This construct includes properties that can be configured to define...

FYI: As I had already expanded the wording a bit more (to provide a bit more ML related context), the version in the next revision will be slightly longer.

rix0rrr · 2022-05-11T17:12:28Z

text/0431-sagemaker-l2-endpoint.md

+the S3 path where the model artifacts are stored and the Docker registry path for the image that
+contains the inference code. The `ContainerDefinition` interface encapsulates both the specification


What's the difference between artifacts and inference code? Why don't they both live in the Docker image? (All to say: I would appreciate one sentence of levelsetting here).

In the next revision, I'll emphasize that a model's code usually changes at a slower rate than a model's artifacts (which will likely change every time the model is re-trained, while the code remains static) making their separation natural from a decoupling standpoint.

I think we (the customers) deserve to see an example of a Model after the blurb about what it is and how you can configure it. It's okay if the additional configuration sections come after it; I want to see like the most basic use case front and center.

In rev2, I've moved up the Model examples prior to diving into container image and model data assets.

rix0rrr · 2022-05-11T17:14:08Z

text/0431-sagemaker-l2-endpoint.md

+const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
+```
+
+#### `AssetImage`


People don't actually see the class AssetImage themselves, so the section title might be confusing.

We generally recommend people use assets. Can we move this paragraph up?

I'll generalize the EcrImage and AssetImage headers and move the assets section above the ECR one.

rix0rrr · 2022-05-11T17:14:29Z

text/0431-sagemaker-l2-endpoint.md

+
+### Model Artifacts
+
+Models are often associated with model artifacts, which are specified via the `modelData` property


Why would I use them? One sentence of context?

I'll also emphasize the point about decoupling trained artifacts from the inference code here in the next revision.

Agreeing that in general, every section title should start with a sentence about what it is and why would one use it.

text/0431-sagemaker-l2-endpoint.md

rix0rrr · 2022-05-11T17:21:17Z

text/0431-sagemaker-l2-endpoint.md

+
+### AutoScaling
+
+The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to


Too much detail on the interface here I think :).

Do you mean you'd like me to remove "on the IEndpointProductionVariant interface" from this sentence?

Simply "To enable autoscaling on the production variant, use the autoScaleInstanceCount method:"

In rev2, I've adjusted both references to IEndpointProductionVariant in the README accordingly.

kaizencc · 2022-05-11T18:51:04Z

text/0431-sagemaker-l2-endpoint.md

+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
+  productionVariant: {


this should be productionVariants: ProductionVariant[]

agreement. In general, throughout this RFC, I'd prefer fooBars[] over foobar and extraFooBars[]

In the earliest PR for these constructs, @skinny85 commented:

So here's an interesting pattern that you can use here.

Since at least one ProductionVariant is required, have in props:

productionVariant: ProductionVariant; // required extraProductionVariants?: ProductionVariant[]; // optional

This way, you communicate to the clients of this class, at compile time, that at least one variant has to be provided.

Given that this new feedback contradicts Adam's original guidance, in light of this context, I just wanted to double-check that I should indeed recombine these attributes back into a single array.

CC-ing @rix0rrr as he gave the same feedback in another comment

Want to hear from others as well but I feel like a synth-time check for at least one production variant would suffice.

I think its okay to go forward with the consensus that we need just one property productionVariants and one property containers. These will be required properties so it stands to reason that at least one variant and one container must be provided. If we have to go back on this decision I will eat crow :)

text/0431-sagemaker-l2-endpoint.md

kaizencc · 2022-05-11T18:59:20Z

text/0431-sagemaker-l2-endpoint.md

+  container: {
+    image: image1, modelData: modelData1
+  },
+  extraContainers: [


I agree. An api with container and extraContainers seems clumsy.

kaizencc · 2022-05-11T19:03:53Z

text/0431-sagemaker-l2-endpoint.md

+
+The `Model` construct associates container images with their optional model data.
+
+#### Single Container Model


nit: I think this should be the start of the README. The Container Images and Model Artifacts sections are important, but they represent familiar APIs to CDK users and do not demonstrate the main L2 that this RFC is proposing. The first example we should see should be the Single Container Model, without fixture:

import * as sagemaker from '@aws-cdk/aws-sagemaker'; const image = sagemaker.ContainerImage.fromAsset(this, 'Image', { directory: path.join('path', 'to', 'Dockerfile', 'directory') }); const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData', path.join('path', 'to', 'artifact', 'file.tar.gz')); const model = new sagemaker.Model(this, 'PrimaryContainerModel', { container: { image: image, modelData: modelData, }, });

Sounds good, I'll move the model content up in the next revision.

text/0431-sagemaker-l2-endpoint.md

TheRealAmazonKendra · 2022-05-11T19:44:25Z

text/0431-sagemaker-l2-endpoint.md

+
+### What are we launching today?
+
+We are launching the first set of L2 constructs for an existing module (`@aws-cdk/aws-sagemaker`),


This syntax is only valid for V1. In text like this it will not translate properly to V2.

Is the following paraphrasing of your comment correct?

In CDK v2, stable modules are published under the aws-cdk-lib package while experimental modules are published under module-specific packages like @aws-cdk/aws-sagemaker-alpha. As a result, usage of @aws-cdk/aws-sagemaker does not align with the stable nor experimental package naming conventions for v2.

Assuming that's correct, would the following reword be okay?

We are launching the first set of L2 constructs for the SageMaker module.

The above is all correct, and the reword is good. We currently do some magic to turn the import * as blah from '@aws-cdk/aws-blah' into import * as blah from 'aws-cdk-lib/aws-blah', but that only happens in the examples, I believe.

Yeah, I was still new and too dumb to know this.

TheRealAmazonKendra · 2022-05-11T19:47:38Z

text/0431-sagemaker-l2-endpoint.md

+###### Container Image
+
+The following interface and abstract class provide mechanisms for configuring a container image.
+These closely mirror [analogous entities from the `@aws-cdk/ecs` module][ecs-image] but, rather than


Same issue as the v1/v2 comment above. Though, actually, the module is named incorrectly regardless. Granted, this only matters if you intend these sections to go in the README when it's written. So, take this comment for what you will.

I'll make the adjustments in the next revision to avoid confusion. Will just "ECS module" be appropriately version agnostic?

TheRealAmazonKendra · 2022-05-11T19:53:44Z

text/0431-sagemaker-l2-endpoint.md

+   ECR as an image source while ECS was capable of sourcing images from either ECR or a
+   customer-owned private repository. Given the fact that these two products' supported images
+   sources may yet again diverge in the future, maybe it would be best to keep their
+   `ContainerImage` APIs separate within their respective modules.


I'm not sure I agree with this point. In general, we should avoid duplicate code. Being unsure of where the place is isn't a compelling reason to have basically two copies of the same thing. A base class/interface that can be extended/implemented as appropriate is better design.

I agree that reuse is better, but in this particular case reusing from ecs doesn't sound great. We might at some point support a more generic ContainerImage--for now this is okay.

Being unsure of where the place is isn't a compelling reason to have basically two copies of the same thing.

That's fair.

I agree that reuse is better, but in this particular case reusing from ecs doesn't sound great. We might at some point support a more generic ContainerImage--for now this is okay.

Just to double-check: does this mean that it's alright to keep the new, duplicated-ish ContainerImage APIs on this RFC and record potential future work to converge the APIs into a generic solution within the "Are there any open issues that need to be addressed later" section?

Combining ContainerImage is out of scope for this RFC. You're good to go ahead with SageMaker specific ContainerImage API. Creating a generic ContainerImage is our problem to solve, you don't really need to worry about it :).

(Unless if you have ideas in which case feel free to remember to open a github issue after this L2 is merged!)

In rev2, I've reworded this bit and moved it into the "Are there any open issues that need to be addressed later?" section.

text/0431-sagemaker-l2-endpoint.md

TheRealAmazonKendra · 2022-05-11T20:02:38Z

text/0431-sagemaker-l2-endpoint.md

+This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
+feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
+launched the following features which would require further additions to the L2 API contracts:
+


If these are part of SageMaker's core feature set, I think they need to be taken into account in this RFP. We may be creating one way doors with contracts set here that will make the user experience more complicated if these are added later.

At least I'd like a good story for how the API would evolve.

If these are part of SageMaker's core feature set, I think they need to be taken into account in this RFP. We may be creating one way doors with contracts set here that will make the user experience more complicated if these are added later.

That's fair. I'll begin familiarizing myself with the last few years of SageMaker features and revise the RFC accordingly.

Quick heads-up: this is going to require a fair bit of experimentation (i.e., I'm actually going to, at least in a rudimentary way, implement the construct changes and test these features to make sure I understand how they are configured and how they function). With upcoming OOTO plans, this means it may be a month or more before I'm done with the next revision. Please let me know if you all have concerns with that timing.

Just to be clear. We won't actually be creating any one-way doors because sagemaker will start out as an alpha module. I think we should be thinking about how these constructs can be extended to support other SageMaker resources in the future, but it's not necessary to include them in the current RFC.

@kaizencc I've familiarized myself with asynchronous inference and serverless inference (the two feature launches with the biggest impact on the proposed APIs in this RFC) by deploying a few sample endpoints via CloudFormation and found a number of feature disparities that I was hoping to get some design guidance on questions related to compile-time, synthesis-time, and deploy-time detection of issues. Particularly, strong compile-time contracts seem like a good idea in practice, but in reality, I would hope that some of these feature gaps disappear. To avoid that future evolution from preventing convergence of the API contracts, using synthesis-time validation seems ideal, but then again, I have no guarantee that these products will actually converge. Anyway, to go into more detail:

Lack of feature parity across inference products: The serverless inference documentation states that the following features are absent from SageMaker's serverless feature:

Some of the features currently available for SageMaker Real-time Inference are not supported for Serverless Inference, including GPUs, AWS marketplace model packages, private Docker registries, Multi-Model Endpoints, VPC configuration, network isolation, data capture, multiple production variants, Model Monitor, and inference pipelines.

Although I haven't found corresponding documentation, asynchronous inference has similar limitations: you cannot use multiple production variants nor can the single supported variant be serverless (it must use instance-based hosting). This led me down two separate paths:

Asynchronous Endpoints: Whether or not an endpoint should be treated as sync vs async is controlled cross-variant at the level of the EndpointConfig resource. When an EndpointConfig resource is configured as async, it is currently only capable of supporting a single production variant. Should I use a compile-time solution (e.g. two different compile-time *Props interfaces to differentiate between the supported features of sync vs async) or instead simply use synthesis-time validation to identify a misconfigured resource (e.g., an async endpoint config with multiple variants)?

Serverless Production Variants: Similarly, even though the instance-vs-serverless specification is per variant, the presence of one serverless production variant prevents customers from using additional variants (i.e., an aspect of one variant influences the behavior of the entire EndpointConfig resource). Would a synthesis-time check be sufficient here to error out on the incompatible variants or would a compile-time contract guarding this scenario be preferred?

Feature-specific metrics: Asynchronous endpoints have unique metrics like ApproximateBacklogSize on the endpoint name dimension whereas serverless variants have unique metrics like ModelSetupTime on the endpoint name + production variant dimension. When adding metric* helper APIs for these scenarios, should they be implemented in such a way to guarantee that the metric is available for the specific use-case from a compile-time perspective (i.e., each disparate use-case has an appropriate interface with only its suitable methods) or is it sufficient to use synthesis-time validation in this case?

@kaizencc

The clarification has given me a change of heart, and now I'm interested in the following idea: two constructs, one generic EndpointConfig, and one "L2.5" AsyncEndpointConfig that provides further restrictions on the productionVariant property.

L2.5 AsyncEndpointConfig construct

For should such an AsyncEndpointConfig, which of the following would apply?

AsyncEndpointConfig extends Resource and instantiates an EndpointConfig construct: This is the classic L2 strategy, but would (I believe) introduce an intermediate async construct in the hierarchy, meaning customers couldn't replace an AsyncEndpointConfig with an EndpointConfig without changing their CloudFormation logical IDs.

AsyncEndpointConfig extends EndpointConfig: I assume this would retain the identical construct hierarchy but would violate the following documented CDK design guideline:

As a rule of thumb, most constructs should directly extend the Construct or Resource instead of another construct. Prefer representing polymorphic behavior through interfaces and not through inheritance.

I'm assuming there could be another scenario where AsyncEndpointConfig instantiates EndpointConfig with a customer-provided scope and ID without extending Construct/Resource to avoid impacting the hierarchy, but I'm not familiar with this strategy being employed in the CDK codebase.

Serverless Production Variants

The other limitation called out above, for synchronous endpoints, is that the presence of a single serverless variant prevents other variants from being configured. Would a synthesis-time approach be appropriate here (i.e., taking ProductionVariant[] on the EndpointConfigProps but erroring out if multiple are present alongside a serverless variant) or would you prefer a model more similar to your serverless-vs-instance specific properties?

Thanks for reading the design guidelines! You are one of the few :).

We don't have to implement AsyncEndpointConfig day 1 of the library. Lets add it to an "extensions" section and then turn those extensions into github issues later.

I think this may be a special case where we are okay with extending EndpointConfig. We generally don't want to support that many L2.5s in the library, which is why we document that we want to extend the resource, but this could be a good candidate. I could be wrong, and be forced to change my mind later, but I'm looking at NodeJsFunction and other L2.5s in the main repo and they extend another construct. At any rate, this doesn't need to be implemented along with EndpointConfig so we don't need to finalize how to instantiate it just yet.

How about mutually exclusive properties serverlessProductionVariant: ServerlessProductionVariant and productionVariants: ProductionVariant[]? That seems like a good mix of property differentiation and synth time validation (that only one of the two props are set).

@kaizencc

We don't have to implement AsyncEndpointConfig day 1 of the library. Lets add it to an "extensions" section and then turn those extensions into github issues later.

Sounds good! I assume the serverless pieces can be added to the same section.

How about mutually exclusive properties serverlessProductionVariant: ServerlessProductionVariant and productionVariants: ProductionVariant[]? That seems like a good mix of property differentiation and synth time validation (that only one of the two props are set).

Two quick follow-up questions:

As both types of variants share a variant name & weight (and ideally could be used as siblings if the SageMaker folks release support for multi-heterogenous variants per endpoint config), to adjust your proposal slightly, would it make sense to define a shared interface like ProductionVariant and then have two interfaces extending it: ServerlessProductionVariant and InstanceProductionVariant? We could simply not export ProductionVariant yet as we don't have a need for polymorphic representation (e.g., ProductionVariant[]) and that way, both exported interfaces are specifically targeting their use-case (e.g., serverless vs instance-based).

Challenging one statement I made above: if the SageMaker folks later do support heterogeneous variants, would it make sense to collapse both variant specifications into a single ProductionVariant[] on the EndpointConfigProps (i.e., is that polymorphic API jsii-compatible)? If so (and assuming we can foresee SageMaker's product evolution), would it be unusual to present two separate variant specifications now if we know they will eventually be "doomed" APIs (i.e., ideal for removal during a major version bump)?

Thanks so much for the help, Kaizen; this conversation has been immensely helpful! I'll start incorporating your recommended changes as my final pair of questions are mostly focused on minor remaining details that can be adjusted rather easily.

Answer for 1 is yes.

Answer for 2 is hmmm. One the one hand, predicting what SageMaker is going to do in the future is future-proofing and we want to avoid doing that. We have no guarantee that that will ever happen. If it happens when SageMaker is in alpha, then we can make that breaking change. If it happens after SageMaker is stabilized, then we can deprecate serverlessProductionVariant and instanceProductionVariants and introduce a new productionVariant property. Or, we can keep both APIs and just remove/modify the synth-time validation -- they're not necessarily doomed and it might be fine to specify different production variants in different properties. I think if that happens it's okay, we're not stuck, and any changes we have to make won't really hurt the API too much.

Feature-specific metrics: I think the RFC should not worry about specific XxxMetric APIs and rather just expose the generic metric API and allow the user to specify exactly what metrics they want to use. This item can be a fast-follow item but isn't required for an alpha release, and there's already a lot of ideas to implement in this RFC. To answer your question though, I think synth-time validation makes sense / is the best we can do.

@kaizencc

As I just published rev2, I wanted to highlight one bit: I kept the previously proposed metric* APIs on the newly renamed IEndpointInstanceProductionVariant as they had already been implemented and would be easy enough to remove if you feel strongly that they shouldn't be included.

text/0431-sagemaker-l2-endpoint.md

TheRealAmazonKendra · 2022-05-11T20:07:05Z

text/0431-sagemaker-l2-endpoint.md

+   CDK integration test perspective, specifying `--no-clean` will allow the generation of a snapshot
+   regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
+   by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
+   from the endpoint integration test at this time.


Rather than excluding testing, we should add retries to the tests that take into account these delays.

I agree. This is out of scope for the current RFC, but it may give @corymhall some inspiration 😜

Rather than excluding testing, we should add retries to the tests that take into account these delays.

I agree. This is out of scope for the current RFC, but it may give (at)corymhall some inspiration 😜

So, hypothetically, if this RFC were bar raised & signed off in 30 days, does this mean that the implementation PR merge would be blocked on the configuration of CloudFormation stack tear down retries to the associated integ test code?

petermeansrock · 2022-05-11T21:34:38Z

Thanks for all the feedback so far! I'll be OOTO for the remainder of the week, so I'll begin fielding comments early next week.

rix0rrr · 2022-05-12T08:55:00Z

text/0431-sagemaker-l2-endpoint.md

+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
+const productionVariant = endpoint.findProductionVariant('variantName');


How do I make a production variant? Is it created when I call this? Is it pre-created? How do I know its name?

The customer currently specifies the variantName attribute of a production variant when defining an EndpointConfig resource.

rix0rrr · 2022-05-12T08:58:06Z

text/0431-sagemaker-l2-endpoint.md

+- `IModel` -- interface for defined and imported models
+
+  ```ts
+  export interface IModel extends cdk.IResource, iam.IGrantable, ec2.IConnectable {


I see a model is Grantable and Connectable. Does a Model correspond 1:1 with compute?

Assuming I'm understanding your question correctly: not exactly. A single model (to which VPC configuration is associated) can be shared by all of the following resources (each with its own compute):

Multiple endpoints

Multiple production variants of a single endpoint

An asynchronous, batch transform job started by the CreateTransformJob SageMaker API

Does your question imply that we may be misusing IConnectable?

rix0rrr · 2022-05-12T08:58:40Z

text/0431-sagemaker-l2-endpoint.md

+     * The VPC to deploy the endpoint to.
+     *
+     * @default none


If this is about the endpoint, then doesn't this need to be configured on the Endpoint?

I'll adjust the comment as the VPC is not endpoint-specific. For example, without ever creating an endpoint, a customer can create an asynchronous, batch transform job via the CreateTransformJob referencing their model. SageMaker will instantiate temporary EC2 instances within the customer's VPC for the lifetime of the transform job.

text/0431-sagemaker-l2-endpoint.md

rix0rrr · 2022-05-12T09:01:04Z

text/0431-sagemaker-l2-endpoint.md

+     * @param bucket The S3 bucket within which the model artifacts are stored
+     * @param objectKey The S3 object key at which the model artifacts are stored
+     */
+    public static fromBucket(bucket: s3.IBucket, objectKey: string): ModelData { ... }


Do we want to deal with object versions here? Can we?

I'll add objectVersion?: string as a parameter to this method (and double-check that SageMaker supports versioned objects).

As far as I can tell (from reading documentation & experimenting with SageMaker operations), SageMaker does not support versioned objects as a source of model data. I've also posted a question to repost.aws to confirm, and the one response I've gotten agrees (at "~90%" confidence) with my conclusion.

rix0rrr · 2022-05-12T11:19:43Z

text/0431-sagemaker-l2-endpoint.md

+     *
+     * @default average over 5 minutes
+     */
+    metricGPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;


No uppercase abbreviations. metricGpuUtilization, metricCpuUtilization, etc.

rix0rrr · 2022-05-12T11:22:13Z

text/0431-sagemaker-l2-endpoint.md

+   ECR as an image source while ECS was capable of sourcing images from either ECR or a
+   customer-owned private repository. Given the fact that these two products' supported images
+   sources may yet again diverge in the future, maybe it would be best to keep their
+   `ContainerImage` APIs separate within their respective modules.


I agree that reuse is better, but in this particular case reusing from ecs doesn't sound great. We might at some point support a more generic ContainerImage--for now this is okay.

rix0rrr · 2022-05-12T11:23:01Z

text/0431-sagemaker-l2-endpoint.md

+   prevents customers from reusing configuration across endpoints. For this reason, an explicit
+   L2 construct for endpoint configuration was incorporated into this RFC.


Thanks for getting into this. What would be a typical use case to use the same config for multiple endpoints?

I'm afraid that I can't say what is typical or atypical for SageMaker customers, but I could imagine the following scenario:

Producer A exposes ten endpoints, each unique to a different consumer (let's label these B thru K).

Each of these endpoints could use one of, say, three endpoint configs (let's label these 1 thru 3) based on the features needed by each consumer.

Consumer B's endpoint is currently associated with endpoint config 1.

At some later point, consumer B wants to leverage a new feature, so in collaboration with the consumer, producer A updates B's endpoint to reference endpoint config 3.

As a result, without switching endpoints, consumer B was able to begin using the features enabled via the pre-built, shared endpoint config 3.

Again though, since this is a completely made up example, I'm not sure if it represents a normal SageMaker use-case. Is it possible for your team to reach out to an internal subject matter expert in the SageMaker product space that can speak more to endpoint config reuse?

Responded on a different comment but this makes sense, just please upstream it to the EndpointConfig readme section (just a simple "By using the EndpointConfig construct, you can define one endpoint and reuse it on multiple Endpoint constructs")

In rev2, I now mention reuse of EndpointConfig across Endpoint resources in the README.

rix0rrr · 2022-05-12T11:23:54Z

text/0431-sagemaker-l2-endpoint.md

+This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
+feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
+launched the following features which would require further additions to the L2 API contracts:
+


At least I'd like a good story for how the API would evolve.

rix0rrr · 2022-05-12T11:25:49Z

text/0431-sagemaker-l2-endpoint.md

+   CDK integration test perspective, specifying `--no-clean` will allow the generation of a snapshot
+   regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
+   by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
+   from the endpoint integration test at this time.


I agree. This is out of scope for the current RFC, but it may give @corymhall some inspiration 😜

comcalvi · 2022-05-12T01:30:47Z

text/0431-sagemaker-l2-endpoint.md

+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
+  productionVariant: {


agreement. In general, throughout this RFC, I'd prefer fooBars[] over foobar and extraFooBars[]

comcalvi · 2022-05-12T01:33:33Z

text/0431-sagemaker-l2-endpoint.md

+    /**
+     * Name of the SageMaker Model.
+     *
+     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the model's


nit: all @defaults that have a text description of the value (like this one) need to be @default -. @defaults that do not have a text description, and only a value, should not have a -.

petermeansrock

Responding to remaining comments on the first draft.

petermeansrock · 2022-05-24T22:03:45Z

text/0431-sagemaker-l2-endpoint.md

+the S3 path where the model artifacts are stored and the Docker registry path for the image that
+contains the inference code. The `ContainerDefinition` interface encapsulates both the specification


In the next revision, I'll emphasize that a model's code usually changes at a slower rate than a model's artifacts (which will likely change every time the model is re-trained, while the code remains static) making their separation natural from a decoupling standpoint.

petermeansrock · 2022-05-24T22:06:56Z

text/0431-sagemaker-l2-endpoint.md

+const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
+```
+
+#### `AssetImage`


I'll generalize the EcrImage and AssetImage headers and move the assets section above the ECR one.

petermeansrock · 2022-05-24T22:08:13Z

text/0431-sagemaker-l2-endpoint.md

+
+### Model Artifacts
+
+Models are often associated with model artifacts, which are specified via the `modelData` property


I'll also emphasize the point about decoupling trained artifacts from the inference code here in the next revision.

text/0431-sagemaker-l2-endpoint.md

petermeansrock · 2022-05-24T22:28:13Z

text/0431-sagemaker-l2-endpoint.md

+
+The `Model` construct associates container images with their optional model data.
+
+#### Single Container Model


Sounds good, I'll move the model content up in the next revision.

petermeansrock · 2022-05-25T00:45:43Z

text/0431-sagemaker-l2-endpoint.md

+   prevents customers from reusing configuration across endpoints. For this reason, an explicit
+   L2 construct for endpoint configuration was incorporated into this RFC.


I'm afraid that I can't say what is typical or atypical for SageMaker customers, but I could imagine the following scenario:

Producer A exposes ten endpoints, each unique to a different consumer (let's label these B thru K).

Each of these endpoints could use one of, say, three endpoint configs (let's label these 1 thru 3) based on the features needed by each consumer.

Consumer B's endpoint is currently associated with endpoint config 1.

At some later point, consumer B wants to leverage a new feature, so in collaboration with the consumer, producer A updates B's endpoint to reference endpoint config 3.

As a result, without switching endpoints, consumer B was able to begin using the features enabled via the pre-built, shared endpoint config 3.

Again though, since this is a completely made up example, I'm not sure if it represents a normal SageMaker use-case. Is it possible for your team to reach out to an internal subject matter expert in the SageMaker product space that can speak more to endpoint config reuse?

petermeansrock · 2022-05-25T00:51:24Z

text/0431-sagemaker-l2-endpoint.md

+This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
+feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
+launched the following features which would require further additions to the L2 API contracts:
+


If these are part of SageMaker's core feature set, I think they need to be taken into account in this RFP. We may be creating one way doors with contracts set here that will make the user experience more complicated if these are added later.

That's fair. I'll begin familiarizing myself with the last few years of SageMaker features and revise the RFC accordingly.

Quick heads-up: this is going to require a fair bit of experimentation (i.e., I'm actually going to, at least in a rudimentary way, implement the construct changes and test these features to make sure I understand how they are configured and how they function). With upcoming OOTO plans, this means it may be a month or more before I'm done with the next revision. Please let me know if you all have concerns with that timing.

text/0431-sagemaker-l2-endpoint.md

petermeansrock · 2022-05-25T00:58:56Z

text/0431-sagemaker-l2-endpoint.md

+   CDK integration test perspective, specifying `--no-clean` will allow the generation of a snapshot
+   regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
+   by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
+   from the endpoint integration test at this time.


Rather than excluding testing, we should add retries to the tests that take into account these delays.

I agree. This is out of scope for the current RFC, but it may give (at)corymhall some inspiration 😜

So, hypothetically, if this RFC were bar raised & signed off in 30 days, does this mean that the implementation PR merge would be blocked on the configuration of CloudFormation stack tear down retries to the associated integ test code?

kaizencc

Hi @petermeansrock! Just doing some housekeeping and responding to some of your inquiries. Will be waiting for the next revision before I take another look. In general, let's focus mostly on the README, and far less on the "technical solution" FAQ. That question is meant to be more high-level anyway, you're in a unique position where you have actually implemented most of what we're designing.

kaizencc · 2022-07-22T14:53:13Z

text/0431-sagemaker-l2-endpoint.md

+
+## Model
+
+By creating a model, you tell Amazon SageMaker where it can find the model components. This includes


[This applies to all documentation in this readme section]

The SageMaker public documentation is a good starting point, but I caution actually copying it over verbatim. We can always link to the documentation with something like "For more information on Amazon SageMaker, see SageMaker docs"

To answer this specific question, I'd like to see something like this:

## Model To create a machine learning model with Amazon Sagemaker, use the `Model` construct. This construct includes properties that can be configured to define...

kaizencc · 2022-07-22T14:54:53Z

text/0431-sagemaker-l2-endpoint.md

+the S3 path where the model artifacts are stored and the Docker registry path for the image that
+contains the inference code. The `ContainerDefinition` interface encapsulates both the specification


I think we (the customers) deserve to see an example of a Model after the blurb about what it is and how you can configure it. It's okay if the additional configuration sections come after it; I want to see like the most basic use case front and center.

kaizencc · 2022-07-22T14:58:17Z

text/0431-sagemaker-l2-endpoint.md

+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});


An asset isn't a resource, so I don't think it needs the scope and construct Id. Instead, the API should look like:

sagemaker.ContainerImage.fromAsset(directory: string) OR, if you have additional options,
sagemaker.ContainerImage.fromAsset(assetOptions: {})

In rev2, I've pivoted to having the new proposed APIs take an Asset from '@aws-cdk/aws-s3-assets' (for ModelData) and DockerImageAsset from '@aws-cdk/aws-ecr-assets' (for ContainerImage) to avoid adding any assets into the hierarchy myself.

Commented elsewhere but I think the correct API here is to create the asset for the user (and let them supply only a path to a directory, and optional options.

kaizencc · 2022-07-22T14:59:03Z

text/0431-sagemaker-l2-endpoint.md

+
+### Model Artifacts
+
+Models are often associated with model artifacts, which are specified via the `modelData` property


Agreeing that in general, every section title should start with a sentence about what it is and why would one use it.

text/0431-sagemaker-l2-endpoint.md

kaizencc · 2022-07-22T15:23:57Z

text/0431-sagemaker-l2-endpoint.md

+   prevents customers from reusing configuration across endpoints. For this reason, an explicit
+   L2 construct for endpoint configuration was incorporated into this RFC.


Responded on a different comment but this makes sense, just please upstream it to the EndpointConfig readme section (just a simple "By using the EndpointConfig construct, you can define one endpoint and reuse it on multiple Endpoint constructs")

kaizencc · 2022-07-22T15:29:37Z

text/0431-sagemaker-l2-endpoint.md

+This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
+feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
+launched the following features which would require further additions to the L2 API contracts:
+


Just to be clear. We won't actually be creating any one-way doors because sagemaker will start out as an alpha module. I think we should be thinking about how these constructs can be extended to support other SageMaker resources in the future, but it's not necessary to include them in the current RFC.

kaizencc · 2022-07-22T15:32:37Z

text/0431-sagemaker-l2-endpoint.md

+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
+  productionVariant: {


I think its okay to go forward with the consensus that we need just one property productionVariants and one property containers. These will be required properties so it stands to reason that at least one variant and one container must be provided. If we have to go back on this decision I will eat crow :)

kaizencc · 2022-07-22T15:34:07Z

text/0431-sagemaker-l2-endpoint.md

+
+### AutoScaling
+
+The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to


Simply "To enable autoscaling on the production variant, use the autoScaleInstanceCount method:"

kaizencc · 2022-07-22T15:34:40Z

text/0431-sagemaker-l2-endpoint.md

+- `IModel` -- interface for defined and imported models
+
+  ```ts
+  export interface IModel extends cdk.IResource, iam.IGrantable, ec2.IConnectable {


petermeansrock

Responding to a few comments about the direction that was taken in authoring revision 2.

petermeansrock · 2022-09-15T22:29:30Z

text/0431-sagemaker-l2-endpoint.md

+
+## Model
+
+By creating a model, you tell Amazon SageMaker where it can find the model components. This includes


FYI: As I had already expanded the wording a bit more (to provide a bit more ML related context), the version in the next revision will be slightly longer.

petermeansrock · 2022-09-15T22:30:22Z

text/0431-sagemaker-l2-endpoint.md

+the S3 path where the model artifacts are stored and the Docker registry path for the image that
+contains the inference code. The `ContainerDefinition` interface encapsulates both the specification


In rev2, I've moved up the Model examples prior to diving into container image and model data assets.

petermeansrock · 2022-09-15T22:33:06Z

text/0431-sagemaker-l2-endpoint.md

+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});


In rev2, I've pivoted to having the new proposed APIs take an Asset from '@aws-cdk/aws-s3-assets' (for ModelData) and DockerImageAsset from '@aws-cdk/aws-ecr-assets' (for ContainerImage) to avoid adding any assets into the hierarchy myself.

text/0431-sagemaker-l2-endpoint.md

petermeansrock · 2022-09-15T22:34:50Z

text/0431-sagemaker-l2-endpoint.md

+
+### AutoScaling
+
+The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to


In rev2, I've adjusted both references to IEndpointProductionVariant in the README accordingly.

petermeansrock · 2022-09-15T22:36:13Z

text/0431-sagemaker-l2-endpoint.md

+     * @param id The id to assign to the image asset
+     * @param props The properties of a Docker image asset
+     */
+    public static fromAsset(scope: Construct, id: string, props: assets.DockerImageAssetProps): ContainerImage { ... }


In rev2, I've decided to pivot to having the new proposed APIs take an Asset from '@aws-cdk/aws-s3-assets' (for ModelData) and DockerImageAsset from '@aws-cdk/aws-ecr-assets' (for ContainerImage) to avoid adding any assets into the hierarchy myself.

petermeansrock · 2022-09-15T22:36:42Z

text/0431-sagemaker-l2-endpoint.md

+     * @param id The id to associate with the new asset
+     * @param path The local path to a model artifact file as a gzipped tar file
+     */
+    public static fromAsset(scope: Construct, id: string, path: string): ModelData { ... }


Noted above but copying here as well:

In rev2, I've pivoted to having the new proposed APIs take an Asset from '@aws-cdk/aws-s3-assets' (for ModelData) and DockerImageAsset from '@aws-cdk/aws-ecr-assets' (for ContainerImage) to avoid adding any assets into the hierarchy myself.

petermeansrock · 2022-09-15T22:38:05Z

text/0431-sagemaker-l2-endpoint.md

+     *
+     * @default 1
+     */
+    readonly initialInstanceCount?: number;


In rev2, I've called these features out explicitly in the "Are there any open issues that need to be addressed later?" section.

petermeansrock · 2022-09-15T22:39:02Z

text/0431-sagemaker-l2-endpoint.md

+   ECR as an image source while ECS was capable of sourcing images from either ECR or a
+   customer-owned private repository. Given the fact that these two products' supported images
+   sources may yet again diverge in the future, maybe it would be best to keep their
+   `ContainerImage` APIs separate within their respective modules.


In rev2, I've reworded this bit and moved it into the "Are there any open issues that need to be addressed later?" section.

petermeansrock · 2022-09-15T22:39:32Z

text/0431-sagemaker-l2-endpoint.md

+   prevents customers from reusing configuration across endpoints. For this reason, an explicit
+   L2 construct for endpoint configuration was incorporated into this RFC.


In rev2, I now mention reuse of EndpointConfig across Endpoint resources in the README.

Pull request has been modified.

kaizencc

Hi @petermeansrock, I'm pretty happy with this RFC modulo some small comments. And we may get further involvement from the sagemaker team as well.

I also didn't really look at the code in this RFC. i'll review that on the actual PR when the time comes, but I'm much less worried about the actual implementation. i focused primarily on the readme and the other parts of the FAQ.

text/0431-sagemaker-l2-endpoint.md

kaizencc · 2022-10-03T21:17:06Z

text/0431-sagemaker-l2-endpoint.md

+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});


Commented elsewhere but I think the correct API here is to create the asset for the user (and let them supply only a path to a directory, and optional options.

kaizencc

Yay @petermeansrock! We're calling this approved :).

petermeansrock and others added 2 commits May 4, 2022 11:37

Update author list to mirror issue/PR

75dbf23

petermeansrock mentioned this pull request May 4, 2022

SageMaker Model Hosting L2 Constructs #431

Closed

11 tasks

comcalvi reviewed May 11, 2022

View reviewed changes

rix0rrr reviewed May 11, 2022

View reviewed changes

kaizencc reviewed May 11, 2022

View reviewed changes

TheRealAmazonKendra reviewed May 11, 2022

View reviewed changes

rix0rrr reviewed May 12, 2022

View reviewed changes

comcalvi previously requested changes May 12, 2022

View reviewed changes

petermeansrock commented May 25, 2022

View reviewed changes

petermeansrock added 2 commits July 21, 2022 16:47

Reword & restructure README based on feedback

3a39cd4

Remove one CDKv1-specific module reference

ed5998b

kaizencc mentioned this pull request Jul 22, 2022

Add Sagemaker endpoint L2 construct #441

Closed

11 tasks

kaizencc previously requested changes Jul 22, 2022

View reviewed changes

petermeansrock added 15 commits August 16, 2022 14:07

Fix specification of defaults

73b6aae

Simplify autoscaling documentation

f29a716

Remove mention of endpoint from model VPC docs

a853734

Eliminate uppercase abbreviations

b4401e0

Remove second CDKv1-specific module reference

5fa4ddc

Simplify container/variant props

ce6445f

Remove container limit in README as it may change

7361761

Remove non-default Rosetta fixtures

40af32b

Document EndpointConfig reuse across Endpoints

20c3487

Reword ContainerImage unification as an open issue

61723f9

Drop scope/id specification in ContainerImage API

72c86a0

Drop scope/id specification in ModelData API

98e875e

Distinguish instance-based variants

3fbea7b

Document open issues based on API evolution

66a662f

Distinguish instance-based variants for Endpoints

7bdb8fb

petermeansrock added 3 commits September 15, 2022 14:08

Drop README mention of IEndpointProductionVariant

799d277

Sync technical solution API with implementation

034cd4a

Expand serverless evolution

619d462

petermeansrock commented Sep 15, 2022

View reviewed changes

kaizencc reviewed Oct 3, 2022

View reviewed changes

petermeansrock and others added 8 commits October 3, 2022 17:03

Create assets behind fromAsset APIs for image/data

80151f9

Trim README content in favor of links to AWS docs

8faf939

Link to SageMaker ENI CloudFormation issue

9723b5f

signing off as api bar raiser

b249090

Adjust EndpointProps to take IEndpointConfig

303b940

Fix my own entry on original authors

5584846

Add API bar raiser login to top to match sign-off

4342d77

Update 0431-sagemaker-l2-endpoint.md

8e4a48c

kaizencc approved these changes Oct 17, 2022

View reviewed changes

kaizencc merged commit 470c005 into aws:master Oct 17, 2022

petermeansrock mentioned this pull request Oct 18, 2022

feat(sagemaker): add model hosting L2 constructs aws/aws-cdk#20113

Closed

4 tasks

petermeansrock deleted the sagemaker-l2-endpoint branch October 18, 2022 23:23

This was referenced Nov 29, 2022

sagemaker: Support serverless variants for endpoints aws/aws-cdk#23148

Open

sagemaker: Support asynchronous endpoints aws/aws-cdk#23149

Open

		An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
		five containers that process requests for inferences on data. You use an inference pipeline to


		## Model

		By creating a model, you tell Amazon SageMaker where it can find the model components. This includes

		the S3 path where the model artifacts are stored and the Docker registry path for the image that
		contains the inference code. The `ContainerDefinition` interface encapsulates both the specification


		### Model Artifacts

		Models are often associated with model artifacts, which are specified via the `modelData` property


		### AutoScaling

		The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to


		The `Model` construct associates container images with their optional model data.

		#### Single Container Model


		### What are we launching today?

		We are launching the first set of L2 constructs for an existing module (`@aws-cdk/aws-sagemaker`),

		prevents customers from reusing configuration across endpoints. For this reason, an explicit
		L2 construct for endpoint configuration was incorporated into this RFC.

RFC 431: SageMaker Model Hosting L2 Constructs #433

RFC 431: SageMaker Model Hosting L2 Constructs #433

Conversation

petermeansrock commented May 4, 2022 • edited by kaizencc Loading

comcalvi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rix0rrr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermeansrock May 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermeansrock Aug 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

L2.5 AsyncEndpointConfig construct

Serverless Production Variants

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermeansrock commented May 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermeansrock Sep 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petermeansrock commented May 4, 2022 •

edited by kaizencc

Loading

petermeansrock May 24, 2022 •

edited

Loading

petermeansrock Aug 20, 2022 •

edited

Loading

L2.5 `AsyncEndpointConfig` construct

petermeansrock Sep 13, 2022 •

edited

Loading