Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate unique SageMaker training job name based on pipeline and ste… #1505

Merged
merged 7 commits into from
May 2, 2023
Merged

Generate unique SageMaker training job name based on pipeline and ste… #1505

merged 7 commits into from
May 2, 2023

Conversation

christianversloot
Copy link
Contributor

…p name

Describe changes

Attempts fixing #1502.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table and the corresponding website section.
  • I have added tests to cover my changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@christianversloot
Copy link
Contributor Author

On Slack, we agreed to the following approach:

Suggested approach:
Compose name composed_name as {pipeline_name}-{step-name}, cut it off to 63-5 = 58 chars if necessary.
Then add the suffix as {composed_name}-{random_suffix}, where random suffic is a length-4 random suffix.

...with the slight change that I am using 60 characters as a safety margin; let me know if this must be omitted.

Using the StepRunInfo object available to the step operator, and the PipelineConfiguration and StepConfiguration objects available in there, this PR first attempts to construct the {pipeline_name}-{step_name} composition as discussed above.

In doing so, I need to use Client() (I borrowed that idea from get_image(...) in the StepRunInfo object), but I am not yet sure if that's the right approach. Feel free to make other suggestions.

Then, as the suffix plus the - character will be 5 characters, the composed name is cut off to the first 55 characters. To that, the generated (pseudo)random suffix is appended as well as the - separator. This is sanitized (as was done previously) and subsequently used to fit the estimator.

Looking forward to your thoughts!

Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I had one query about whether to take some of the string construction logic outside the main launch method..

@strickvl strickvl added the bug Something isn't working label Apr 25, 2023
@strickvl strickvl requested a review from schustmi April 25, 2023 14:49
Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable to use the Client in there, as well, IMO.

A bit hard separate the PR from the various formatting changes. Assuming you branched off develop, unsure why it wants to make all those changes since running the format script on my end doesn't make any such changes on develop.

@christianversloot
Copy link
Contributor Author

Agreed, I don't know either. Without it the build, lint tests etc would break.

Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think this looks good now. I would like to fix the formatting issue, however, before merging this, and will let the CI / tests run through as well.

@christianversloot
Copy link
Contributor Author

christianversloot commented May 2, 2023

Just saw your previous comment. I have reverted the mass formatting commit in af51e83 and ran ruff and black manually on the src/zenml/integrations/aws/step_operators containing the only changes in this PR in 5daed8b. This should mean that the Files changed section is ok again while lint checks etc should pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants