Skip to content

Commit

Permalink
DAOS-16653 pool: Batch crt events (#15230)
Browse files Browse the repository at this point in the history
* DAOS-16653 pool: Batch crt events

When multiple engines become unavailable around the same time, if a pool
cannot tolerate the unavailability of those engines, it is sometimes
desired that the pool would not exclude any of the engines. Hence, this
patch introduces a CaRT event delay, tunable via the server-side
environment variable, CRT_EVENT_DELAY, so that the events signaling the
unavailability of those engines will be handled in hopefully one batch,
giving pool_svc_update_map_internal a chance to reject the pool map
update based on the RF check.

When the RF check rejects a pool map change, we should revisit the
corresponding events later, rather than simply throwing them away. This
patch improves this case by returning the events back to the event
queue, and pause the queue handling until next new event or pool map
update.

  - Introduce event sets: pool_svc_event_set. Now the event queue can be
    simplified to just one event set.

  - Add the ability to pause and resume the event handling: pse_paused.

  - Track the time when the latest event was queued: pse_time.


Signed-off-by: Li Wei <[email protected]>
  • Loading branch information
liw authored Oct 11, 2024
1 parent 032849e commit 6177922
Show file tree
Hide file tree
Showing 3 changed files with 346 additions and 110 deletions.
1 change: 1 addition & 0 deletions docs/admin/env_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Environment variables in this section only apply to the server side.
|DAOS\_MD\_CAP |Size of a metadata pmem pool/file in MBs. INTEGER. Default to 128 MB.|
|DAOS\_START\_POOL\_SVC|Determines whether to start existing pool services when starting a daos\_server. BOOL. Default to true.|
|CRT\_DISABLE\_MEM\_PIN|Disable memory pinning workaround on a server side. BOOL. Default to 0.|
|CRT\_EVENT\_DELAY|Delay in seconds before handling each CaRT event. INTEGER. Default to 10 s. A longer delay enables batching of successive CaRT events, leading to fewer pool map changes when multiple engines become unavailable at around the same time.|
|DAOS\_SCHED\_PRIO\_DISABLED|Disable server ULT prioritizing. BOOL. Default to 0.|
|DAOS\_SCHED\_RELAX\_MODE|The mode of CPU relaxing on idle. "disabled":disable relaxing; "net":wait on network request for INTVL; "sleep":sleep for INTVL. STRING. Default to "net"|
|DAOS\_SCHED\_RELAX\_INTVL|CPU relax interval in milliseconds. INTEGER. Default to 1 ms.|
Expand Down
Loading

0 comments on commit 6177922

Please sign in to comment.