Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spot recovery without cloud specified #1077

Merged
merged 5 commits into from
Aug 15, 2022
Merged

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Aug 15, 2022

This fixes the problem caused by #1014, which disables launching the cluster with region specified but without cloud specified. That makes the spot recovery strategy fails to recover a spot task without cloud specified.

launched_region = handle.launched_resources.region
new_resources = resources.copy(region=launched_region)
new_resources = resources.copy(cloud=launched_cloud,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why would specifying region only not work here? It should work in a new task.

Maybe I'm not understanding this:

This fixes the problem caused by #1014, which disables launching the cluster with region specified but without cloud specified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: why would specifying region only not work here? It should work in a new task.

In #1014, we change the behavior in a new task by enforcing the user to specify the cloud when the region is specified to be consistent with the zone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is there a reason that has to be the case? That seems to introduce frictions:

  • We can easily deduce the cloud from a region/zone
  • Imagine a single-cloud user - it's clear to that user what the cloud is

This is an orthogonal issue that shouldn't block this PR. cc @infwinston

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are great points! I think the reason is mainly for potential conflicts among zones in different clouds. @infwinston can say more about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was mainly to avoid potential confusion and future region/zone name conflicts.

For example, users may run into this typo between
us-west-1a and us-west1-a where the former is a zone in AWS, the latter one is in GCP.
If we automatically derive cloud then we might unintentionally launch a wrong one for users.
Also, it avoids potential region/zone name conflicts with other clouds.

Since specifying region/zone can be considered an advanced usage, I think it'd be better to ask more clarity from users. Does that makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These make sense, but overall I still think deducing could be better:

  • The typo cases would be caught by confirmation
  • Conflicts can be resolved when they arise.
  • However, single-cloud users are most prevalent now and their UX problem (requiring them to type something redundant) doesn't have a workaround.

@Michaelvll Michaelvll merged commit 83d2936 into master Aug 15, 2022
@Michaelvll Michaelvll deleted the fix-spot-recovery branch August 15, 2022 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants