Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stepfunctions-tasks): task constructs for creating and transforming SageMaker jobs #8391

Merged
merged 13 commits into from
Jun 9, 2020

Conversation

shivlaks
Copy link
Contributor

@shivlaks shivlaks commented Jun 5, 2020

replacement for the current implementation of SageMaker service
integration and state level properties are merged and represented
as a construct.

The previous implementation that implemented IStepFunctionsTask has
been removed. The previously existing classes were directly converted to constructs
as these were marked experimental and still require further iterations
as they are not backed by a SageMaker L2 (does not exist yet).

In the interest of pragmatism, I decided to move them to leverage the newer pattern
so we can deprecate the Task construct.

Note that I have left the unit and integration tests verbatim. The integration test
requires some additional steps as there are pre-requisites to running a training job
such as creating and configuring input data that are not currently included.

BREAKING CHANGE: constructs for SageMakerCreateTrainingJob and
SageMakerCreateTransformJob replace previous implementation that
implemented IStepFunctionsTask.

  • stepfunctions-tasks: volumeSizeInGB property in ResourceConfig for
    SageMaker tasks are now type core.Size
  • stepfunctions-tasks: maxPayload property in SagemakerTransformProps
    is now type core.Size
  • stepfunctions-tasks: volumeKmsKeyId property in SageMakerCreateTrainingJob is now volumeEncryptionKey

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@shivlaks shivlaks requested review from rix0rrr, nija-at and a team June 5, 2020 07:32
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Jun 5, 2020
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: b9c9923
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@nija-at nija-at added the pr/do-not-merge This PR should not be merged at this time. label Jun 5, 2020
Comment on lines 122 to +130
private securityGroup?: ec2.ISecurityGroup;
private readonly securityGroups: ec2.ISecurityGroup[] = [];
private readonly subnets?: string[];
private readonly integrationPattern: sfn.ServiceIntegrationPattern;
private readonly integrationPattern: sfn.IntegrationPattern;
private _role?: iam.IRole;
private _grantPrincipal?: iam.IPrincipal;

constructor(private readonly props: SagemakerTrainTaskProps) {
this.integrationPattern = props.integrationPattern || sfn.ServiceIntegrationPattern.FIRE_AND_FORGET;
constructor(scope: Construct, id: string, private readonly props: SageMakerCreateTrainingJobProps) {
super(scope, id, props);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'd prefer that we either store props as a private member or a bunch of separate values like securityGroup, subnets, vpc, etc.

It's weird that we evaluate an intermediate value based on props in the constructor, store them as instance variables state while still storing the original props as an instance variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your objection here - my preference is to store props as a private member. I'll work this feedback on a second pass. It does look strange to use both.

Comment on lines +230 to +231
...(spec.trainingImage ? { TrainingImage: spec.trainingImage.bind(this).imageUri } : {}),
...(spec.algorithmName ? { AlgorithmName: spec.algorithmName } : {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I find this style hard to read. I'd rather create a local variable called trainingImage and then return it in this object.

Of course, this is everywhere in this file and that's not the intent of this PR, so I'll let you decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to restructure this entire class. Noting your feedback and will take a second pass to do some cleanup after this PR.

Comment on lines +13 to +15
* Subscribe to demo Algorithm vended by Amazon (free):
* https://aws.amazon.com/marketplace/ai/procurement?productId=cc5186a0-b8d6-4750-a9bb-1dcdf10e787a
* FIXME - create Input data pertinent for the training model and insert into S3 location specified in inputDataConfig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a input data available somewhere that one can just use, like a sagemaker demo data or something?
Would be nice to provide a couple of CLI commands to quickly do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spent a little bit of time, there's no turn-key usage available for training data... I'll iterate on this to improve on it. I think a simple set of training data is something we could do with a few CLI calls

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 8c4f309
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@shivlaks shivlaks removed the pr/do-not-merge This PR should not be merged at this time. label Jun 9, 2020
@mergify
Copy link
Contributor

mergify bot commented Jun 9, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Contributor

mergify bot commented Jun 9, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 480d4c0 into master Jun 9, 2020
@mergify mergify bot deleted the shivlaks/sfn-sagemaker-tasks branch June 9, 2020 05:58
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: bb10404
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants