DAOS-10138 pool: Improve PS reconfigurations #10121

liw · 2022-08-26T08:33:12Z

A PS currently performs reconfigurations (i.e., membership changes) only
upon pool map changes. If the PS leader crashes in the middle of a
series of reconfigurations, the new PS leader will not plan any
reconfiguration (or notify the MS of the latest list of replicas) until
the pool map changes for some other reason. This patch lets a PS leader
check if reconfigurations are required when it steps up.

To avoid blocking the step up and pool map change processes, this patch
performs each series of reconfigurations asynchronously in a ULT. The
change allows the reconfiguration process to wait for pending events,
retry upon certain errors (in the future), and wait for RPC timeouts
without directly impacting the normal PS operations. Hence, this patch
reverts the workaround (209ba92) that
skips destroying PS replicas.

Moreover, this patch adds a safety net that prevent an older rsvc leader
from removing an rsvc replica created by a newer rsvc leader. Although
it cannot resolve all problems in the area, the natural, term-based
approach requires no RDB layout change and is simple to implement. The
patch has to change rdb_test a bit to allow more than one test rsvc
instance, so that a quick rsvc test can be added.

Since select_svc_ranks avoids rank 0 but ds_pool_plan_svc_reconfs does
not, this patch modifies the former to remove the avoidance, so that
during a PoolCreate operation the MS observes a notification from the
new PS with the same ranks as the PoolCreate dRPC response. We have to
change a few tests as well as the MS to make this work:

Work around a race between mgmtSvc.PoolCreate and
Database.handlePoolRepsUpdate. See the comment for the details.
Update svc.yaml to reflect the new PS replacement policy.
Fix the daos_obj.c assertion that PS leaders can be rank 0.

Signed-off-by: Li Wei [email protected]
Required-githooks: true

github-actions · 2022-08-26T08:33:38Z

Bug-tracker data:
Ticket title is 'Restore Pool Service redundancy when enough engines are available'
Status is 'In Review'
Labels: 'Metadata'
https://daosio.atlassian.net/browse/DAOS-10138

daosbuild1

Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10121/1/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules

src/pool/srv_pool.c

daosbuild1 · 2022-08-26T08:34:48Z

Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/1/execution/node/138/log

Updated patch

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2022-08-30T04:09:34Z

Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10121/2/testReport/(root)/

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2022-08-30T14:45:36Z

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/3/execution/node/872/log

daosbuild1 · 2022-08-30T19:47:16Z

Test stage Functional Hardware Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/3/execution/node/1013/log

daosbuild1 · 2022-08-31T01:27:51Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/3/execution/node/1102/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2022-08-31T08:43:11Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/5/execution/node/320/log

daosbuild1 · 2022-08-31T08:44:30Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/5/execution/node/335/log

daosbuild1 · 2022-08-31T08:44:36Z

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/5/execution/node/323/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2022-08-31T15:13:57Z

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/6/execution/node/871/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2022-09-01T12:57:55Z

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/7/execution/node/873/log

daosbuild1 · 2022-09-02T03:45:19Z

Test stage Functional Hardware Small completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10121/7/execution/node/1014/log

A PS currently performs reconfigurations (i.e., membership changes) only upon pool map changes. If the PS leader crashes in the middle of a series of reconfigurations, the new PS leader will not plan any reconfiguration (or notify the MS of the latest list of replicas) until the pool map changes for some other reason. This patch lets a PS leader check if reconfigurations are required when it steps up. To avoid blocking the step up and pool map change processes, this patch performs each series of reconfigurations asynchronously in a ULT. The change allows the reconfiguration process to wait for pending events, retry upon certain errors (in the future), and wait for RPC timeouts without directly impacting the normal PS operations. Hence, this patch reverts the workaround (209ba92) that skips destroying PS replicas. Moreover, this patch adds a safety net that prevent an older rsvc leader from removing an rsvc replica created by a newer rsvc leader. Although it cannot resolve all problems in the area, the natural, term-based approach requires no RDB layout change and is simple to implement. The patch has to change rdb_test a bit to allow more than one test rsvc instance, so that a quick rsvc test can be added. Since select_svc_ranks avoids rank 0 but ds_pool_plan_svc_reconfs does not, this patch modifies the former to remove the avoidance, so that during a PoolCreate operation the MS observes a notification from the new PS with the same ranks as the PoolCreate dRPC response. We have to change a few tests as well as the MS to make this work: - Work around a race between mgmtSvc.PoolCreate and Database.handlePoolRepsUpdate. See the comment for the details. - Update svc.yaml to reflect the new PS replacement policy. - Fix the daos_obj.c assertion that PS leaders can be rank 0. Signed-off-by: Li Wei <[email protected]> Required-githooks: true

Signed-off-by: Li Wei <[email protected]> Required-githooks: true

daosbuild1 · 2022-09-08T23:19:17Z

Test stage NLT on EL 8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-10121/14/display/redirect

daosbuild1

LGTM. No errors found by checkpatch.

liw · 2022-09-08T23:21:12Z

Rebased to resolve conflicts caused by the automatic base change (i.e., the previous base branch was merged to master). No changes are made otherwise.

daosbuild1

LGTM. No errors found by checkpatch.

mjmac · 2022-09-09T16:06:42Z

src/control/system/raft/database.go

+	//
+	// The pool remains in Creating state after PoolCreate completes,
+	// leading to DER_AGAINs during PoolDestroy.
+	if p.State == system.PoolServiceStateReady && ps.State == system.PoolServiceStateCreating {


Looks good to me, thanks. This is probably not even really a workaround now, but just the correct behavior. If we are in this situation, all of the information in the "update" is probably stale anyhow.

I'm not 100% satisfied with this change for 1) the update may contain a valid svc rank refresh as in the example (where the svc ranks could differ with those reported by ds_pool_svc_dist_create in theory), and 2) I return nil in this case (because otherwise callers would need some specific error that they could recognize). The control plane deserves a better solution eventually. :)

Required-githooks: true

daosbuild1

LGTM. No errors found by checkpatch.

Required-githooks: true

daosbuild1

LGTM. No errors found by checkpatch.

kccain

mostly questions, looks very good

kccain · 2022-09-22T20:12:32Z

src/rsvc/srv.c

+		rc = rdb_ping(svc->s_db, caller_term);
+		if (rc != 0) {
+			if (rc != -DER_STALE)
+				D_ERROR("%s: failed to ping local replica\n", svc->s_name);


(very minor) Only for developer insight when debugging problems, is it worth a D_DEBUG in the event rc == -DER_STALE to know that this replica was asked to be stopped by another "stale term" replica?

with ds_rsvc_start() it looks like a D_ERROR will be emitted there in case of -DER_STALE.

OK, let me add a DEBUG.

kccain · 2022-09-22T21:54:07Z

src/pool/srv_pool.c

+	 * reconfigurations or the last MS notification.
+	 */
+	svc->ps_reconf.psc_force_notify = true;
+	pool_svc_schedule_reconf(svc);


Upon any subsequent rc != 0 errors below this point, is it worth issuing pool_svc_cancel_and_wait_reconf()?
Or, consider moving this to be one of the last steps after everything else has succeeded?

Oops, this is definitely a defect. :( Thanks a lot. Fixed.

kccain · 2022-09-22T22:16:31Z

src/pool/srv_pool.c


 	if (rdb_get_ranks(svc->ps_rsvc.s_db, &new) == 0) {
 		d_rank_list_sort(current);
 		d_rank_list_sort(new);

-		if (!d_rank_list_identical(new, current)) {
+		if (reconf->psc_force_notify || !d_rank_list_identical(new, current)) {


I may be getting a little lost looking at all of the parts of this change, but a question: we don't expect many instances where the psc_force_notify will cause a RAS notification even when the lists are identical (no membership changes occur), is that right? Since most of the time if we are here, there is some membership change occurring (e.g., you would be stepping up as a leader because something happened to the previous leader)?

Right. I hope we won't see any besides those triggered by pool_svc_step_up_cb, who doesn't know if we need to notify the MS or not. The ds_notify_pool_svc_update call seemed to always succeed (expect for ENOMEM or bugs), as it only passes the notification to the local daos_server regardless of whether the latter is able to pass the notification to the MS (which is an issue itself, but I felt this PR is too big to accommodate any changes for this issue). Does it sound like I get your point, or maybe not? :)

I was thinking about a scenario where a new leader steps up but the membership has not changed - so I guess now looking closer maybe this could be the consequence of the previous leader experiencing some failure (e.g., in a step of updating the pool map) and calling rdb_resign() due to that error.

But I see your line of thinking here, that if for some very unexpected reason ds_notify_pool_svc_update() fails then reconf->psc_force_notify will remain true.

Probably no issue here, as we will want to see any leader transition reported.

Signed-off-by: Li Wei <[email protected]> Required-githooks: true

Required-githooks: true

liw

Thanks, Ken.

liw · 2022-09-23T07:30:28Z

src/pool/srv_pool.c

+	 * reconfigurations or the last MS notification.
+	 */
+	svc->ps_reconf.psc_force_notify = true;
+	pool_svc_schedule_reconf(svc);


Oops, this is definitely a defect. :( Thanks a lot. Fixed.

liw · 2022-09-23T07:41:07Z

src/pool/srv_pool.c


 	if (rdb_get_ranks(svc->ps_rsvc.s_db, &new) == 0) {
 		d_rank_list_sort(current);
 		d_rank_list_sort(new);

-		if (!d_rank_list_identical(new, current)) {
+		if (reconf->psc_force_notify || !d_rank_list_identical(new, current)) {


Right. I hope we won't see any besides those triggered by pool_svc_step_up_cb, who doesn't know if we need to notify the MS or not. The ds_notify_pool_svc_update call seemed to always succeed (expect for ENOMEM or bugs), as it only passes the notification to the local daos_server regardless of whether the latter is able to pass the notification to the MS (which is an issue itself, but I felt this PR is too big to accommodate any changes for this issue). Does it sound like I get your point, or maybe not? :)

liw · 2022-09-23T07:44:25Z

src/rsvc/srv.c

+		rc = rdb_ping(svc->s_db, caller_term);
+		if (rc != 0) {
+			if (rc != -DER_STALE)
+				D_ERROR("%s: failed to ping local replica\n", svc->s_name);


OK, let me add a DEBUG.

daosbuild1

LGTM. No errors found by checkpatch.

kccain · 2022-09-23T17:45:14Z

src/pool/srv_pool.c


 	if (rdb_get_ranks(svc->ps_rsvc.s_db, &new) == 0) {
 		d_rank_list_sort(current);
 		d_rank_list_sort(new);

-		if (!d_rank_list_identical(new, current)) {
+		if (reconf->psc_force_notify || !d_rank_list_identical(new, current)) {


I was thinking about a scenario where a new leader steps up but the membership has not changed - so I guess now looking closer maybe this could be the consequence of the previous leader experiencing some failure (e.g., in a step of updating the pool map) and calling rdb_resign() due to that error.

But I see your line of thinking here, that if for some very unexpected reason ds_notify_pool_svc_update() fails then reconf->psc_force_notify will remain true.

Probably no issue here, as we will want to see any leader transition reported.

A PS currently performs reconfigurations (i.e., membership changes) only upon pool map changes. If the PS leader crashes in the middle of a series of reconfigurations, the new PS leader will not plan any reconfiguration (or notify the MS of the latest list of replicas) until the pool map changes for some other reason. This patch lets a PS leader check if reconfigurations are required when it steps up. To avoid blocking the step up and pool map change processes, this patch performs each series of reconfigurations asynchronously in a ULT. The change allows the reconfiguration process to wait for pending events, retry upon certain errors (in the future), and wait for RPC timeouts without directly impacting the normal PS operations. Hence, this patch reverts the workaround (209ba92) that skips destroying PS replicas. Moreover, this patch adds a safety net that prevent an older rsvc leader from removing an rsvc replica created by a newer rsvc leader. Although it cannot resolve all problems in the area, the natural, term-based approach requires no RDB layout change and is simple to implement. The patch has to change rdb_test a bit to allow more than one test rsvc instance, so that a quick rsvc test can be added. Since select_svc_ranks avoids rank 0 but ds_pool_plan_svc_reconfs does not, this patch modifies the former to remove the avoidance, so that during a PoolCreate operation the MS observes a notification from the new PS with the same ranks as the PoolCreate dRPC response. We have to change a few tests as well as the MS to make this work: - Work around a race between mgmtSvc.PoolCreate and Database.handlePoolRepsUpdate. See the comment for the details. - Update svc.yaml to reflect the new PS replacement policy. - Fix the daos_obj.c assertion that PS leaders can be rank 0. Signed-off-by: Li Wei <[email protected]> Required-githooks: true

A PS currently performs reconfigurations (i.e., membership changes) only upon pool map changes. If the PS leader crashes in the middle of a series of reconfigurations, the new PS leader will not plan any reconfiguration (or notify the MS of the latest list of replicas) until the pool map changes for some other reason. This patch lets a PS leader check if reconfigurations are required when it steps up. To avoid blocking the step up and pool map change processes, this patch performs each series of reconfigurations asynchronously in a ULT. The change allows the reconfiguration process to wait for pending events, retry upon certain errors (in the future), and wait for RPC timeouts without directly impacting the normal PS operations. Hence, this patch reverts the workaround (209ba92) that skips destroying PS replicas. Moreover, this patch adds a safety net that prevent an older rsvc leader from removing an rsvc replica created by a newer rsvc leader. Although it cannot resolve all problems in the area, the natural, term-based approach requires no RDB layout change and is simple to implement. The patch has to change rdb_test a bit to allow more than one test rsvc instance, so that a quick rsvc test can be added. Since select_svc_ranks avoids rank 0 but ds_pool_plan_svc_reconfs does not, this patch modifies the former to remove the avoidance, so that during a PoolCreate operation the MS observes a notification from the new PS with the same ranks as the PoolCreate dRPC response. We have to change a few tests as well as the MS to make this work: - Work around a race between mgmtSvc.PoolCreate and Database.handlePoolRepsUpdate. See the comment for the details. - Update svc.yaml to reflect the new PS replacement policy. - Fix the daos_obj.c assertion that PS leaders can be rank 0. Signed-off-by: Li Wei <[email protected]>

daosbuild1 previously requested changes Aug 26, 2022

View reviewed changes

src/pool/srv_pool.c Outdated Show resolved Hide resolved

src/pool/srv_pool.c Outdated Show resolved Hide resolved

liw force-pushed the liw/ps-reconf branch 2 times, most recently from 96bec7b to 0cd6d58 Compare August 30, 2022 01:20

daosbuild1 reviewed Aug 30, 2022

View reviewed changes

liw force-pushed the liw/ps-reconf branch from 0cd6d58 to 5a178f2 Compare August 30, 2022 08:41

daosbuild1 reviewed Aug 30, 2022

View reviewed changes

liw force-pushed the liw/ps-reconf branch from 5a178f2 to 5d75e5a Compare August 31, 2022 08:34

liw changed the base branch from master to liw/pool-destroy-svc August 31, 2022 08:36

daosbuild1 reviewed Aug 31, 2022

View reviewed changes

liw force-pushed the liw/pool-destroy-svc branch from 5ef83e5 to a1ea680 Compare August 31, 2022 08:46

liw force-pushed the liw/ps-reconf branch from 5d75e5a to b2b6bb1 Compare August 31, 2022 08:46

daosbuild1 reviewed Aug 31, 2022

View reviewed changes

liw force-pushed the liw/pool-destroy-svc branch from a1ea680 to 33ff5b0 Compare September 1, 2022 03:48

liw force-pushed the liw/ps-reconf branch from b2b6bb1 to 700a491 Compare September 1, 2022 04:24

daosbuild1 reviewed Sep 1, 2022

View reviewed changes

liw force-pushed the liw/pool-destroy-svc branch from 33ff5b0 to 521779e Compare September 2, 2022 07:28

liw added 3 commits September 9, 2022 07:16

Address Mike's comment

1b6fb0a

Signed-off-by: Li Wei <[email protected]> Required-githooks: true

Suppress NLT issues

ac2d4da

Signed-off-by: Li Wei <[email protected]> Required-githooks: true

liw dismissed liuxuezhao’s stale review via ac2d4da September 8, 2022 23:18

liw force-pushed the liw/ps-reconf branch from 0f45168 to ac2d4da Compare September 8, 2022 23:18

liw removed the request for review from a team September 8, 2022 23:19

daosbuild1 reviewed Sep 8, 2022

View reviewed changes

daosbuild1 reviewed Sep 9, 2022

View reviewed changes

mjmac reviewed Sep 9, 2022

View reviewed changes

Merge branch 'master' into liw/ps-reconf

7b32850

Required-githooks: true

daosbuild1 reviewed Sep 13, 2022

View reviewed changes

Merge branch 'master' into liw/ps-reconf

f97e7cb

Required-githooks: true

daosbuild1 reviewed Sep 18, 2022

View reviewed changes

kccain previously approved these changes Sep 22, 2022

View reviewed changes

liw added 2 commits September 23, 2022 16:02

Address Ken's comments

b2246c4

Signed-off-by: Li Wei <[email protected]> Required-githooks: true

Merge branch 'master' into liw/ps-reconf

ea3e3fa

Required-githooks: true

liw dismissed kccain’s stale review via ea3e3fa September 23, 2022 08:06

liw commented Sep 23, 2022

View reviewed changes

daosbuild1 reviewed Sep 23, 2022

View reviewed changes

liw requested review from kccain and liuxuezhao September 23, 2022 13:06

kccain approved these changes Sep 23, 2022

View reviewed changes

liuxuezhao approved these changes Sep 26, 2022

View reviewed changes

liw requested a review from a team September 26, 2022 23:53

mjmac merged commit 67604a8 into master Sep 27, 2022

mjmac deleted the liw/ps-reconf branch September 27, 2022 12:44

DAOS-10138 pool: Improve PS reconfigurations #10121

DAOS-10138 pool: Improve PS reconfigurations #10121

Conversation

liw commented Aug 26, 2022 • edited Loading

github-actions bot commented Aug 26, 2022 • edited Loading

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Aug 26, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Aug 30, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Aug 30, 2022

daosbuild1 commented Aug 30, 2022

daosbuild1 commented Aug 31, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Aug 31, 2022

daosbuild1 commented Aug 31, 2022

daosbuild1 commented Aug 31, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Aug 31, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Sep 1, 2022

daosbuild1 commented Sep 2, 2022

daosbuild1 commented Sep 8, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

liw commented Sep 8, 2022

daosbuild1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liw Sep 11, 2022 • edited Loading

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

kccain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liw commented Aug 26, 2022 •

edited

Loading

github-actions bot commented Aug 26, 2022 •

edited

Loading

liw Sep 11, 2022 •

edited

Loading