-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spot] Fix spot failure reason when cloud is specified #1714
Conversation
Thanks for the fix! @Michaelvll Looks like in this PR we catch three errors
I'm thinking how to make sure we won't miss out more/keep tracking new errors in the future. Lines 393 to 397 in 71697f7
or maybe beyond this PR, should we try to categorize errors into a fixed/smaller set of categories and include details in e.reason ? like maybe ClusterOwnerIdentityMismatchError and CloudUserIdentityError can be grouped together? Correct me if I'm wrong as I'm not super familiar with potential errors and how we handled them.
|
Actually, let us only catch the I was trying to find a tool for python that allows java-style error signatures for functions but failed to find it.
Good point! Added to the docstr.
Yes, we need to make the exceptions simpler and easier to track in our backend, but also to make sure not overly use an exception like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! let's raise an issue for further improvement.
…1714) * Fix spot failure reason when cloud is specified * format * fix * format * Update exceptions in docstr * fix docstr * Add error * format
Patches #1655
When the cloud is specified, we don't raise the
ResourcesUnavailableError
but the actual precheck errors. We need to catch those exceptions in therecovery_strategy
.Tested (run the relevant ones):
sky spot launch -n spot-maskgit-minerl-maskgit-interp --cloud gcp echo hi