-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Made Compute Engine Metadata service exceptions more informative #1637
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,22 +202,24 @@ def get( | |
|
||
backoff = ExponentialBackoff(total_attempts=retry_count) | ||
|
||
last_exception = None | ||
for attempt in backoff: | ||
try: | ||
response = request(url=url, method="GET", headers=headers_to_use) | ||
if response.status in transport.DEFAULT_RETRYABLE_STATUS_CODES: | ||
_LOGGER.warning( | ||
"Compute Engine Metadata server unavailable on " | ||
"attempt %s of %s. Response status: %s", | ||
attempt, | ||
retry_count, | ||
response.status, | ||
raise exceptions.TransportError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the retryable status code is expected in this case so I'm not sure a conversion to exception is the right thing and it makes the code a bit harder to understand. (1) Can the customer set log level to an appropriate level to see these attempts? This is the best solution because it preserves the full history of retry attempts. (2) If the code gets here, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please take another look at the code execution flow. It's likely that the code change is not changing the behavior the way you think it does. There are no new exceptions thrown to the user. And the logging still happens the same way as before. It's just that now in one of the cases instead of having duplicated logging code there the logging branch is triggered by an exception. The initial solution was simple and did not change this area. However, @viacheslav-rostovtsev noticed an issue when the retriable error limit is exhausted. In that case the information about the service responses (retriable errors) is lost. A simplest way to capture that information in the final exception is to wrap it in TransportError and let it get assigned to |
||
"Compute Engine Metadata server unavailable. " | ||
"Response status: {}; Response:\n{}".format( | ||
response.status, response.data | ||
), | ||
response, | ||
retryable=True, | ||
) | ||
continue | ||
else: | ||
break | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clear |
||
|
||
except exceptions.TransportError as e: | ||
last_exception = e | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good idea. Clear |
||
_LOGGER.warning( | ||
"Compute Engine Metadata server unavailable on " | ||
"attempt %s of %s. Reason: %s", | ||
|
@@ -229,7 +231,7 @@ def get( | |
raise exceptions.TransportError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we should consult (1) Non-retryable error encountered: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this was the first iteration of the idea. It seems to me that converting response to an exception works just as well and simplifies the logic since only the last exception needs to be tracked. The logging etc does not change. Do you want us to change this? |
||
"Failed to retrieve {} from the Google Compute Engine " | ||
"metadata service. Compute Engine Metadata server unavailable".format(url) | ||
) | ||
) from last_exception | ||
|
||
content = _helpers.from_bytes(response.data) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clear
last_exception
here if you're following my suggestions.