Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15651 control: fail fast on invalid rank for integrate #14165

Closed
wants to merge 1 commit into from

Conversation

cdavis28
Copy link
Contributor

Add logic to not sleep and retry on a bad rank provided for a pool reintegrate.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented Apr 16, 2024

Ticket title is 'dmg pool reintegrate with an invalid rank will timeout'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-15651

@cdavis28 cdavis28 changed the title DAOS-15651 fail fast on invalid rank for integrate DAOS-15651 mgmt: fail fast on invalid rank for integrate Apr 16, 2024
@cdavis28 cdavis28 requested a review from mjmac April 16, 2024 15:35
@mjmac mjmac changed the title DAOS-15651 mgmt: fail fast on invalid rank for integrate DAOS-15651 control: fail fast on invalid rank for integrate Apr 16, 2024
@cdavis28 cdavis28 force-pushed the DAOS-15651 branch 2 times, most recently from f61a133 to 3fbb455 Compare April 17, 2024 00:22
check rank against membership before issuing rank update
for integrate.
Also fix a minor unit test bug related to getting syslog

Signed-off-by: Chris Davis <[email protected]>
@cdavis28 cdavis28 marked this pull request as ready for review April 17, 2024 07:14
@cdavis28 cdavis28 requested review from a team as code owners April 17, 2024 07:14
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good.

@tanabarr
Copy link
Contributor

Will the syslog changes to restrict since one minutes have any wider ranging effects?

@mjmac
Copy link
Contributor

mjmac commented Apr 17, 2024

Will the syslog changes to restrict since one minutes have any wider ranging effects?

The test just uses that command to figure out if the user running the tests has permissions to access the journal. The output isn't used. In our development environments, the command can run for a long time and cause a test timeout.

Copy link
Contributor

@mjmac mjmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I was going to suggest that a test case be added to the PoolReintegrate unit tests, but those don't exist! I guess they weren't added when the handler was added. I've created DAOS-15669 to address that.

@cdavis28
Copy link
Contributor Author

LGTM. I was going to suggest that a test case be added to the PoolReintegrate unit tests, but those don't exist! I guess they weren't added when the handler was added. I've created DAOS-15669 to address that.

Yeah I was looking for that test as well. Thanks for filing that.

@jolivier23
Copy link
Contributor

To land this, you (or someone else) will need to create a PR from a branch in the main repo rather than a fork. This is an unfortunately limitaiton of Intel CI. Since you are in DAOS org, probably better for your to do yourself.

Recommendation is something like cdavis28/DAOS-15651 for branch name to avoid conflicts with others

@cdavis28
Copy link
Contributor Author

Moved to #14213

@mjmac
Copy link
Contributor

mjmac commented Apr 22, 2024

To land this, you (or someone else) will need to create a PR from a branch in the main repo rather than a fork. This is an unfortunately limitaiton of Intel CI. Since you are in DAOS org, probably better for your to do yourself.

Recommendation is something like cdavis28/DAOS-15651 for branch name to avoid conflicts with others

D'oh. Sorry @cdavis28, didn't even notice this.

@brianjmurrell: Does this limitation of not running tests from forks still exist with the Use-GHA: true pragma?

@brianjmurrell
Copy link
Contributor

@brianjmurrell: Does this limitation of not running tests from forks still exist with the Use-GHA: true pragma?

I haven't tried in a while TBH but it should work (mostly at least, if not entirely). There could be limitations of some actions we use working correctly on forks but I am not recalling anything specific and I am trying to keep my eye out for such things.

I'd say it's well worth a try and report if there are any issues with it. This is the direction we want to go in ultimately. I really should be trying to work from my own fork when I am working on the GHA workflows just to keep it honest.

@cdavis28
Copy link
Contributor Author

cdavis28 commented Apr 22, 2024

@brianjmurrell: Does this limitation of not running tests from forks still exist with the Use-GHA: true pragma?

how is this used? update to the commit message

Yes.

$ git commit --allow-empty -m "Build with GitHub Actions

Run-GHA: true"

should do the trick.

or a label?

That's an interesting thought, but no, not currently.

Another common workflow I have seen in other GitHub repos/Actions is to add a comment with a command in it to trigger something, so that's another (future) possibility. But for now, the only supported method is the above commit message.

In the future, you can include the Run-GHA: true in any commit message -- it doesn't have to be an empty commit. But it MUST be in the latest commit message on the PR for it to work. So that is, you probably want to add it to every commit message.

Ultimately, I would like to use a more automatic heuristic to decide if the build and test should run in GHA or not but unfortunately for just about every heuristic that I can come up with, there are exceptions that break it.

@cdavis28 cdavis28 closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants