-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14976 object: properly select collective punch leader for resend #13602
Conversation
Before resending the collective punch RPC, we need to check whether the original leader shard is valid or not. It is possible the object layout has been shrinked after rebuild. Under such case, select a new shard as the collective punch leader. Signed-off-by: Fan Yong <[email protected]>
Bug-tracker data: |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13602/1/testReport/ |
PoolCreateTests.test_create_no_space_loop failed for DAOS-14884, not related with the patch. |
if (auxi->io_retry) { | ||
if (unlikely(spa->pa_auxi.shard >= obj->cob_shards_nr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks to me spa->pa_auxi.shard != obj->cob_shards_nr looks more safe, shard might be extended after rebuild as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no matter if the object layout is extended, because under extended case, the shard with index "spa->pa_auxi.shard" will not disappear, it still can be used as the leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks for explain.
Before resending the collective punch RPC, we need to check whether the original leader shard is valid or not. It is possible the object layout has been shrinked after rebuild. Under such case, select a new shard as the collective punch leader.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: