Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Made Compute Engine Metadata service exceptions more informative #1637

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions google/auth/compute_engine/_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,22 +202,24 @@ def get(

backoff = ExponentialBackoff(total_attempts=retry_count)

last_exception = None
for attempt in backoff:
try:
response = request(url=url, method="GET", headers=headers_to_use)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear last_exception here if you're following my suggestions.

if response.status in transport.DEFAULT_RETRYABLE_STATUS_CODES:
_LOGGER.warning(
"Compute Engine Metadata server unavailable on "
"attempt %s of %s. Response status: %s",
attempt,
retry_count,
response.status,
raise exceptions.TransportError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the retryable status code is expected in this case so I'm not sure a conversion to exception is the right thing and it makes the code a bit harder to understand.

(1) Can the customer set log level to an appropriate level to see these attempts? This is the best solution because it preserves the full history of retry attempts.

(2) If the code gets here, response is set and you can just use that to raise the error below. This would also catch the non-retryable error case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take another look at the code execution flow. It's likely that the code change is not changing the behavior the way you think it does. There are no new exceptions thrown to the user. And the logging still happens the same way as before. It's just that now in one of the cases instead of having duplicated logging code there the logging branch is triggered by an exception.

The initial solution was simple and did not change this area. However, @viacheslav-rostovtsev noticed an issue when the retriable error limit is exhausted. In that case the information about the service responses (retriable errors) is lost. A simplest way to capture that information in the final exception is to wrap it in TransportError and let it get assigned to last_exception.

"Compute Engine Metadata server unavailable. "
"Response status: {}; Response:\n{}".format(
response.status, response.data
),
response,
retryable=True,
)
continue
else:
break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear last_exception here if you follow my suggestions.


except exceptions.TransportError as e:
last_exception = e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Clear response here if you're following my suggestions.

_LOGGER.warning(
"Compute Engine Metadata server unavailable on "
"attempt %s of %s. Reason: %s",
Expand All @@ -229,7 +231,7 @@ def get(
raise exceptions.TransportError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should consult last_exception and response to construct the right error:

(1) Non-retryable error encountered: response.status
(2) Ran out of retries: last_exception
(3) Ran out of retries: response.status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was the first iteration of the idea. It seems to me that converting response to an exception works just as well and simplifies the logic since only the last exception needs to be tracked. The logging etc does not change. Do you want us to change this?

"Failed to retrieve {} from the Google Compute Engine "
"metadata service. Compute Engine Metadata server unavailable".format(url)
)
) from last_exception

content = _helpers.from_bytes(response.data)

Expand Down