-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart() #40203
Conversation
This piece of code has a history:
See also:
But I think this is a good workaround for octopus. |
src/cephadm/cephadm
Outdated
for sleep_secs in [1, 4, 25]: | ||
try: | ||
try: | ||
cmd = ['mgr', 'stat'] | ||
out = cli(cmd) | ||
except Exception: | ||
cmd = ['mgr', 'dump'] | ||
out = cli(cmd) | ||
j = json.loads(out) | ||
break | ||
except json.decoder.JSONDecodeError: | ||
cmd_str = 'ceph ' + ' '.join(cmd) | ||
logger.info('`%s` failed transiently. Retrying. waiting %s seconds...' \ | ||
% (cmd_str, sleep_secs)) | ||
time.sleep(sleep_secs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for backporting, I'd suggest to limit the amount of changes to existing code
def retry(func: Callable):
for sleep_secs in [1, 4, 4]:
try:
return func()
(and sleeping 25 seconds is too long)
@@ -3926,16 +3926,25 @@ def command_bootstrap(ctx): | |||
# first get latest mgrmap epoch from the mon. try newer 'mgr | |||
# stat' command first, then fall back to 'mgr dump' if | |||
# necessary | |||
try: | |||
out = cli(['mgr', 'stat']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out = cli(['mgr', 'stat']) | |
out = retry(lambda: cli(['mgr', 'stat'])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code review and please find this updated patch.
As per your suggestion, I used a new retry() function in order to minimize changes to the existing wait_for_mgr_restart() function. Because json.loads() is throwing the unhandled exception, not cli(), though I need to re-call cli() if a JSON exception is thrown, I had the retry() function call json.loads().
except: | ||
logger.debug('Invalid JSON. Retrying in %s seconds...' % sleep_secs) | ||
time.sleep(sleep_secs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return json.loads(cli_func())
Should I add the above on line 3931 so that we call one more time and if the exception is thrown at least the error is identifcal so as to not introduce any new behavior?
|
'ceph mgr dump' does not always return valid JSON so cephadm will throw an exception sometimes when applying a spec as per the issue this PR closes. Add a try/except to catch a possible JSONDecodeError and retry after sleeping. Fixes: https://tracker.ceph.com/issues/49870 Signed-off-by: John Fulton <[email protected]>
Thanks, I should have pep8'd it before submitting. Fixed now. |
'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.
Fixes: https://tracker.ceph.com/issues/49870
Signed-off-by: John Fulton [email protected]
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox