Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15120 crt: Misc d_rand-related fixes #13738

Merged
merged 1 commit into from
Feb 7, 2024
Merged

DAOS-15120 crt: Misc d_rand-related fixes #13738

merged 1 commit into from
Feb 7, 2024

Conversation

liw
Copy link
Contributor

@liw liw commented Feb 5, 2024

  • Remove duplicate d_srand calls in crt.

  • Use tv_nsec instead of tv_nsec / 1e3 when calling d_srand, for this reduces the chance of seed collisions among daos_engine processes.

  • Replace rand calls in libgurt and libpool with d_rand.

  • Remove d_randn (introduced by this author), since lrand48_r doesn't return the lowest bits from Xi, and the % method is therefore sufficient for our purposes.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

  - Remove duplicate d_srand calls in crt.

  - Use tv_nsec instead of tv_nsec / 1e3 when calling d_srand, for this
    reduces the chance of seed collisions among daos_engine processes.

  - Replace rand calls in libgurt and libpool with d_rand.

  - Remove d_randn (introduced by this author), since lrand48_r doesn't
    return the lowest bits from Xi, and the % method is therefore
    sufficient for our purposes.

Signed-off-by: Li Wei <[email protected]>
Required-githooks: true
Copy link

github-actions bot commented Feb 5, 2024

Bug-tracker data:
Ticket title is 'Raft split votes due to bad random numbers'
Status is 'Open'
Errors are Unknown component
https://daosio.atlassian.net/browse/DAOS-15120

@liw liw marked this pull request as ready for review February 5, 2024 23:37
@liw liw requested review from a team as code owners February 5, 2024 23:37
@liw liw requested review from frostedcmos and kccain February 5, 2024 23:37
d_srand(d_timeus_secdiff(0) + getpid());
rc = d_gettime(&now);
D_ASSERTF(rc == 0, "d_gettime: " DF_RC "\n", DP_RC(rc));
d_srand(now.tv_sec * 1000 * 1000 * 1000 + now.tv_nsec + getpid());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: i feel now.tv_nsec + getpid() is sufficient for seed. wont now.tv_sec * 1000 * 1000 * 1000 cause overflows? coverity might complain if so:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make this seeding change until I noticed a couple of identical seeds (printed by a temporary fprintf that I've deleted from this PR) being used during a few dmg system start commands of 6 engines. The nanosecond-based seeding seemed to be absent of such collisions. (Actually I would prefer calling getrandom to obtain truly random bytes, but let's see how the current approach works first.)

I think overflows in this case is okay, for d_srand uses the lower 32 bits only. Didn't think of Coverity though; if Coverity complains, that's my bad and feel free to assign such issue to me. :)

@liw liw requested a review from a team February 6, 2024 23:32
@mchaarawi mchaarawi merged commit 927cc47 into master Feb 7, 2024
46 of 48 checks passed
@mchaarawi mchaarawi deleted the liw/fix-rand branch February 7, 2024 17:56
jolivier23 pushed a commit that referenced this pull request May 16, 2024
Pick from all possible ranks for protocol query handling

Backports the following patches, last one fixes the issue but
prior two are benign and make the backport work without conflict

DAOS-13717 client: use d_rand() to generate random number (#12418)
DAOS-15120 crt: Misc d_rand-related fixes (#13738)
DAOS-15476 client: do not use rsvc for proto query rank selection (#14131)

Signed-off-by: Mohamad Chaarawi <[email protected]>
jolivier23 pushed a commit that referenced this pull request May 16, 2024
Pick from all possible ranks for protocol query handling

Backports the following patches, last one fixes the issue but
prior two are benign and make the backport work without conflict

DAOS-13717 client: use d_rand() to generate random number (#12418)
DAOS-15120 crt: Misc d_rand-related fixes (#13738)
DAOS-15476 client: do not use rsvc for proto query rank selection (#14131)

Required-githooks: true

Change-Id: Icd83ae9307a2dfc928ca524af2954131b161157d
Signed-off-by: Mohamad Chaarawi <[email protected]>
Signed-off-by: Wang Shilong <[email protected]>
Signed-off-by: Li Wei <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
jolivier23 added a commit that referenced this pull request May 20, 2024
Pick from all possible ranks for protocol query handling

Backports the following patches, last one fixes the issue but prior two are benign and make the backport work without conflict

DAOS-13717 client: use d_rand() to generate random number (#12418)
DAOS-15120 crt: Misc d_rand-related fixes (#13738)
DAOS-15476 client: do not use rsvc for proto query rank selection (#14131)
DAOS-15540 cart: Change proto query default timeout (#14382)

Signed-off-by: Mohamad Chaarawi <[email protected]>
Signed-off-by: Wang Shilong <[email protected]>
Signed-off-by: Li Wei <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants