-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16653 pool: Batch crt events (#15230) #15302
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ticket title is 'PS to wait for group stabilization before handling events' |
787d26f
to
c8a191e
Compare
c8a191e
to
d7ebf20
Compare
* DAOS-16653 pool: Batch crt events When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check. When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update. - Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set. - Add the ability to pause and resume the event handling: pse_paused. - Track the time when the latest event was queued: pse_time. Skip-nlt: true Signed-off-by: Li Wei <[email protected]> Required-githooks: true
d7ebf20
to
b4d115d
Compare
Skipped NLT because build 1 to 3 were all blocked by the NLT warning on |
liuxuezhao
approved these changes
Oct 18, 2024
gnailzenh
approved these changes
Oct 18, 2024
kccain
pushed a commit
that referenced
this pull request
Oct 18, 2024
* DAOS-16653 pool: Batch crt events When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check. When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update. - Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set. - Add the ability to pause and resume the event handling: pse_paused. - Track the time when the latest event was queued: pse_time. Signed-off-by: Li Wei <[email protected]>
mjean308
pushed a commit
that referenced
this pull request
Oct 23, 2024
* DAOS-16653 pool: Batch crt events When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check. When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update. - Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set. - Add the ability to pause and resume the event handling: pse_paused. - Track the time when the latest event was queued: pse_time. Signed-off-by: Li Wei <[email protected]>
daltonbohning
pushed a commit
that referenced
this pull request
Oct 23, 2024
Skip-build: true When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check. When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update. - Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set. - Add the ability to pause and resume the event handling: pse_paused. - Track the time when the latest event was queued: pse_time. Signed-off-by: Li Wei <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
daltonbohning
pushed a commit
that referenced
this pull request
Oct 23, 2024
* DAOS-16653 pool: Batch crt events When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check. When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update. - Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set. - Add the ability to pause and resume the event handling: pse_paused. - Track the time when the latest event was queued: pse_time. Signed-off-by: Li Wei <[email protected]>
18 tasks
This was referenced Oct 28, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approved-to-merge
PR has received release branch merge approval
clean-cherry-pick
Cherry-pick from another branch that did not require additional edits
priority
Ticket has high priority (automatically managed)
release-2.6.2
Targeted for release 2.6.2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When multiple engines become unavailable around the same time, if a pool cannot tolerate the unavailability of those engines, it is sometimes desired that the pool would not exclude any of the engines. Hence, this patch introduces a CaRT event delay, tunable via the server-side environment variable, CRT_EVENT_DELAY, so that the events signaling the unavailability of those engines will be handled in hopefully one batch, giving pool_svc_update_map_internal a chance to reject the pool map update based on the RF check.
When the RF check rejects a pool map change, we should revisit the corresponding events later, rather than simply throwing them away. This patch improves this case by returning the events back to the event queue, and pause the queue handling until next new event or pool map update.
Introduce event sets: pool_svc_event_set. Now the event queue can be simplified to just one event set.
Add the ability to pause and resume the event handling: pse_paused.
Track the time when the latest event was queued: pse_time.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: