Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-job Instance Types and AMIs #2390

Merged
merged 42 commits into from
Aug 29, 2024
Merged

Per-job Instance Types and AMIs #2390

merged 42 commits into from
Aug 29, 2024

Conversation

AndrewPlayer3
Copy link
Contributor

@AndrewPlayer3 AndrewPlayer3 commented Aug 26, 2024

No description provided.

@AndrewPlayer3 AndrewPlayer3 marked this pull request as ready for review August 27, 2024 17:38
@AndrewPlayer3 AndrewPlayer3 requested a review from a team as a code owner August 27, 2024 17:38
apps/main-cf.yml.j2 Outdated Show resolved Hide resolved
job_spec/SRG_GSLC.yml Outdated Show resolved Hide resolved
@jhkennedy
Copy link
Contributor

which specifies that there should be a "deployment-specified gpu ami and instance type fleet" and a "deployment-specified cpu ami and instance type fleet". This is probably more flexible so perhaps better than what was written for the acceptance criteria, but we should get buy-in from at least @forrestfwilliams on this approach.

Looking ahead, we'll be moving ARIA S1 GUNW (INSAR_ISCE) jobs into production HyP3 soon, which are currently running on the C or M instance family (we found them to be the most cost-optimal). So, I think this approach is better overall as we'll have broader needs than just GPU vs CPU.

This will have implications for our current cost control strategy.

Right now, we do cost control by limiting the max vcpus, which works well with one compute environment. Multiple compute environments will definitely reduce the effectiveness of this strategy. For the LAVAs deployment, I don't think we'll be using the EDC cost controls, so won't matter there. We can side-step the cost control worry in EDC by keeping it a single compute environment there for now, but we'll need to tackle it when we bring in GUNWs if we want to use the C/M instance families.

I'm not sure how forward-looking we want to be overall here, but we definitely don't want to put ourselves in a corner later.

Copy link
Contributor

@jtherrmann jtherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are just extra blank lines, not a trailing newline.

job_spec/WATER_MAP_EQ.yml Outdated Show resolved Hide resolved
job_spec/WATER_MAP.yml Outdated Show resolved Hide resolved
job_spec/SRG_GSLC.yml Outdated Show resolved Hide resolved
job_spec/INSAR_ISCE.yml Outdated Show resolved Hide resolved
job_spec/ARIA_RAIDER.yml Outdated Show resolved Hide resolved
apps/workflow-cf.yml.j2 Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jtherrmann jtherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the code changes look good, but I still want to render the templates locally and review the changes.

Copy link
Contributor

@jtherrmann jtherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also confirmed that none of our existing CF resources have been renamed.

Copy link
Contributor

@jtherrmann jtherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed that the templates render locally as expected.

@jtherrmann
Copy link
Contributor

The snyk failure is unrelated to this PR.

@AndrewPlayer3 AndrewPlayer3 merged commit 3905646 into develop Aug 29, 2024
10 of 11 checks passed
@AndrewPlayer3 AndrewPlayer3 deleted the multi_instance_type branch August 29, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants