Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(batch): managed compute environment now properly works with compute resources and instanceRole has correct docstring and type definition #6549

Merged
merged 10 commits into from
Mar 6, 2020

Conversation

andrestone
Copy link
Contributor

@andrestone andrestone commented Mar 2, 2020

Commit Message

fix(batch): managed compute environment now properly works with compute resources and instanceRole has correct docstring and type definition (#6549)

"Managed" vs "Unmanaged" compute environments

Current implementation was built based on a misinterpretation of the "Managed Compute Environment". The ComputeEnvironment class was originally designed assuming that a "managed" compute environment wouldn't need ComputeResource specifications, since it was a "managed" environment.

However, as described in the documentation, a "managed" compute environment requires the ComputeResources specification in order to automatically provision your host instances, according to your AllocationStrategy. The "unmanaged" compute environment, in turn, won't deal with the computeResources property and launching the necessary compute instances inside the linked ECS cluster is a manual task.

Tests were passing because of inverted logic on this line.

ComputeResource instanceRole property

In the current implementation, the instanceRole property is of type iam.Role but the service actually requires an Instance Profile (in this case "instance" refers to the underlying EC2 resources) associated with a Role that has the AmazonEC2ContainerServiceforEC2Role managed policy attached. The property is optional because the default behaviour is to create one. Moreover, a default Role and an Instance Profile are created at first console experience on the region, with the name ecsInstanceRole. This PR allows you to easily refer to this existing instance profile (or any other that has the same policy), if you don't want to create new Instance Profile and a new Instance Role, by simply passing the profile name.

BREAKING CHANGE: the allocationStrategy property was moved from ComputeEnvironmentProps to the ComputeResources interface, which is where it semantically belongs.

End Commit Message


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: d9c7f42
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@andrestone andrestone changed the title fix(batch): fixing 'managed' compute environment and some other bugs; adding launch template support fix(batch): fixing 'managed' compute environment and some other bugs; adding launch template support (WIP) Mar 2, 2020
@andrestone andrestone changed the title fix(batch): fixing 'managed' compute environment and some other bugs; adding launch template support (WIP) fix(batch): fixing 'managed' compute environment and some other bugs; adding launch template support Mar 3, 2020
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: cea5eb1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 2574965
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@iliapolo
Copy link
Contributor

iliapolo commented Mar 5, 2020

Hi @andrestone, thanks so much for submitting this PR!

You mentioned:

Current implementation was built based on a misinterpretation of the "Managed Compute Environment"

Can you provide some details as to what exactly was misinterpreted? And what is the behavioral change you are proposing?

It's not that I disagree, I'm just looking for some more context to ease the review process and also provide more clarify to those seeing this PR that were not part of earlier discussions. It's great to references other data sources, but we also strive for each PR to contain just enough content so it can be reviewed on its own.

Also, since our PR titles eventually end up in our release notes, we try making these title briefly explain what the expected result will be. Basically, instead of saying:

fixing 'managed' compute environment,

try formulating this as:

'managed' compute environment will now...,

or something of that nature.

You also noted that this PR adds launch template support and fixes some bugs. I would probably leave the 'bugs' out of the title and simply mention what was fixed in the PR description. If you want to be super diligent, you can open separate issues on those bugs and add fixes (#{issue-number}) to this PR description. This will also appear in the release notes and be more visible to users looking for those bugs to be fixed.

In the meantime i will start digging into this PR :)

Thanks again!

@andrestone
Copy link
Contributor Author

Sure thing, @iliapolo! I'll rewrite the PR description and title ASAP, since this is a fundamental flaw in the implementation.

To be honest, these tests deserve a full review, but I opted to simply tweaking them into passing in order to quickly fix it.

What do you think about fixing the tests in another issue / PR?

Thanks for reviewing.

PS.: there's more to it, but basically what the current behaviour does is to require compute resources specification when it shouldn't and deny specifying the compute resources when they're needed. Tests were passing because the L0 construct was receiving the opposite boolean value for the "managed" property in the compute environment.

@iliapolo
Copy link
Contributor

iliapolo commented Mar 5, 2020

Hey @andrestone

To be honest, these tests deserve a full review, but I opted to simply tweaking them into passing in order to quickly fix it.

What do you think about fixing the tests in another issue / PR?

Do you have specific things you think are missing from the tests? Or is it just a general need for review? I don't mind waiting to improve the tests, but i wouldn't want us to ship this code without a critical test.

PS.: there's more to it, but basically what the current behaviour does is to require compute resources specification when it shouldn't and deny specifying the compute resources when they're needed. Tests were passing because the L0 construct was receiving the opposite boolean value for the "managed" property in the compute environment.

Got it, thanks 👍

One other thing, after going over the code, i feel it won't be a major effort to separate the launch template support to its own PR. What do you think? This will make things much clearer because its actually a new feature in the system and we want it to appear under New Features in the release notes.

@iliapolo
Copy link
Contributor

iliapolo commented Mar 5, 2020

@andrestone We should also mention in the description that this PR fixes #6590

@andrestone
Copy link
Contributor Author

andrestone commented Mar 5, 2020

Hello @iliapolo,

Do you have specific things you think are missing from the tests? Or is it just a general need for review? I don't mind waiting to improve the tests, but i wouldn't want us to ship this code without a critical test.

I didn't have the time to take a closer look at the tests, but they felt a bit loose while I was tweaking them. The integration test felt ok tho.

One other thing, after going over the code, i feel it won't be a major effort to separate the launch template support to its own PR. What do you think? This will make things much clearer because its actually a new feature in the system and we want it to appear under New Features in the release notes.

Ok, I'll do it.

Thanks again for reviewing.

PS.: Lunch time! I'll go over the other questions when I get back.

@iliapolo
Copy link
Contributor

iliapolo commented Mar 5, 2020

@andrestone Enjoy your lunch :)

@andrestone andrestone changed the title fix(batch): fixing 'managed' compute environment and some other bugs; adding launch template support fix(batch): managed compute environment now properly works with compute resources Mar 5, 2020
@andrestone andrestone changed the title fix(batch): managed compute environment now properly works with compute resources fix(batch): managed compute environment now properly works with compute resources and instanceRole has correct docstring and type definition Mar 5, 2020
@mergify mergify bot dismissed iliapolo’s stale review March 5, 2020 16:04

Pull request has been modified.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: f9d5938
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 69b0d7f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@andrestone
Copy link
Contributor Author

andrestone commented Mar 5, 2020

Hey, @iliapolo!

Integration tests are failing on master because of the bugs addressed here, so the launch template PR will fail to build until this one is merged. Should I wait to avoid confusion or should I submit the failing PR anyway?

I'm rebuilding and committing the expected.json for this one.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: c8514ac
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@Dzhuneyt
Copy link
Contributor

Dzhuneyt commented Mar 6, 2020

We just encountered the same issue while building an MVP for our new product. Thanks for working on it!

@iliapolo
Copy link
Contributor

iliapolo commented Mar 6, 2020

@andrestone looks great! Also appreciate you splitting the launch template PR 👍

Ok, just one final thing regarding the PR description. To ensure that the final commit message will contain the necessary details, we either require a squash commit, or, which i personally like better, use special markdown headers in the PR description.

Basically, the PR description should include the following:

Commit Message

fix(batch): managed compute environment now properly works with compute resources and instanceRole has correct docstring and type definition (#6549)

{optional-extended-commit-message}

BREAKING CHANGE: the allocationStrategy property was moved from ComputeEnvironmentProps to the ComputeResources interface, which is where it semantically belongs.

End Commit Message

The content between these two headers is then picked up by the mergify bot who does our merges for us.

As an example, have a look at this PR, which generated this commit message

Thanks for bearing with me! :)

Copy link
Contributor

@iliapolo iliapolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the comment about the PR description

@andrestone
Copy link
Contributor Author

Hello! I tried to edit as you said.

Thanks again for reviewing.

@mergify
Copy link
Contributor

mergify bot commented Mar 6, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 472053d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Mar 6, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@ayush987goyal
Copy link
Contributor

This PR breaks the Batch integration tests because of #6615

@andrestone
Copy link
Contributor Author

This PR breaks the Batch integration tests because of #6615

Integration tests are actually passing. It's the usage that isn't covered.

Waiting for further discussions on this to proceed with a PR for a fix.

@iliapolo
Copy link
Contributor

iliapolo commented Mar 7, 2020

@andrestone see my #6615 (comment).

@ayush987goyal how did you discover this? just your usage? I'm wondering why you said the integration tests are failing, to make sure we didn't miss anything.

@ayush987goyal
Copy link
Contributor

@iliapolo I am creating a sfn-task to run a batch job #6396
I saw the integration test (integ.batch.ts) and found that the case of MANAGED computeEnvironment without the computeResource props is not covered (not even in unit tests). Maybe the test won't actually fail since this use case is not covered (sorry I thought since the integ.batch.expected.json is used in yarn tests, the tests would pass and the next time someone runs yarn integ inte.batch,js the tests would fail but conversely since the case is not even there, they tests are passing)

@iliapolo
Copy link
Contributor

iliapolo commented Mar 7, 2020

@ayush987goyal Got it, thanks. This use-case will now be covered after the PR for #6396

eladb pushed a commit that referenced this pull request Mar 9, 2020
…te resources and instanceRole has correct docstring and type definition (#6549)
horsmand pushed a commit to horsmand/aws-cdk that referenced this pull request Mar 9, 2020
…te resources and instanceRole has correct docstring and type definition (aws#6549)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants