Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart() #40203

Merged
merged 1 commit into from
Mar 22, 2021

Conversation

fultonj
Copy link
Contributor

@fultonj fultonj commented Mar 17, 2021

'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.

Fixes: https://tracker.ceph.com/issues/49870
Signed-off-by: John Fulton [email protected]

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@fultonj fultonj requested a review from a team as a code owner March 17, 2021 22:06
@sebastian-philipp
Copy link
Contributor

Comment on lines 3932 to 3946
for sleep_secs in [1, 4, 25]:
try:
try:
cmd = ['mgr', 'stat']
out = cli(cmd)
except Exception:
cmd = ['mgr', 'dump']
out = cli(cmd)
j = json.loads(out)
break
except json.decoder.JSONDecodeError:
cmd_str = 'ceph ' + ' '.join(cmd)
logger.info('`%s` failed transiently. Retrying. waiting %s seconds...' \
% (cmd_str, sleep_secs))
time.sleep(sleep_secs)
Copy link
Contributor

@sebastian-philipp sebastian-philipp Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backporting, I'd suggest to limit the amount of changes to existing code

def retry(func: Callable): 
    for sleep_secs in [1, 4, 4]:
        try:    
            return func() 

(and sleeping 25 seconds is too long)

@@ -3926,16 +3926,25 @@ def command_bootstrap(ctx):
# first get latest mgrmap epoch from the mon. try newer 'mgr
# stat' command first, then fall back to 'mgr dump' if
# necessary
try:
out = cli(['mgr', 'stat'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out = cli(['mgr', 'stat'])
out = retry(lambda: cli(['mgr', 'stat']))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code review and please find this updated patch.

As per your suggestion, I used a new retry() function in order to minimize changes to the existing wait_for_mgr_restart() function. Because json.loads() is throwing the unhandled exception, not cli(), though I need to re-call cli() if a JSON exception is thrown, I had the retry() function call json.loads().

except:
logger.debug('Invalid JSON. Retrying in %s seconds...' % sleep_secs)
time.sleep(sleep_secs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

       return json.loads(cli_func())

Should I add the above on line 3931 so that we call one more time and if the exception is thrown at least the error is identifcal so as to not introduce any new behavior?

@liewegas
Copy link
Member

cephadm:3928:13: E722 do not use bare 'except'
cephadm:3929:73: E222 multiple spaces after operator
1     E222 multiple spaces after operator
1     E722 do not use bare 'except'

'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.

Fixes: https://tracker.ceph.com/issues/49870
Signed-off-by: John Fulton <[email protected]>
@fultonj
Copy link
Contributor Author

fultonj commented Mar 21, 2021

cephadm:3928:13: E722 do not use bare 'except'
cephadm:3929:73: E222 multiple spaces after operator
1     E222 multiple spaces after operator
1     E722 do not use bare 'except'

Thanks, I should have pep8'd it before submitting. Fixed now.

@liewegas liewegas merged commit 110c7a6 into ceph:master Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants