-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-3114 vos: Consolidate timestamp cache APIs #3200
Conversation
Extents covered by a minor epoch punch should not be visible. This modifies the handling of prior punches so that the minor epoch is accounted for. Modifies replay punches so that they use max minor epoch - 1 and non-transactional updates to use max minor epoch. This way rebuild can replay the punch using only the major epoch and the existing flag. Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
a punch/update to execute in non-transactional mode with separate minor epochs Signed-off-by: Jeff Olivier <[email protected]>
Simplify the timestamp cache some by consolidating APIs. There is now one API for checking for conflicts and one API for updating the read timestamps. This is preparation for adding more read timestamp updates for iteration and query APIs and for adding epoch uncertainty checks. More simplification is probably needed but this is a step in that direction. Signed-off-by: Jeff Olivier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Signed-off-by: Jeff Olivier <[email protected]>
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/1/execution/node/803/log |
Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/vos/vos_ts.h
Outdated
/** Write level for the set */ | ||
uint16_t ts_wr_level; | ||
/** Max type */ | ||
uint16_t ts_max_type; | ||
/** Transaction that owns the set */ | ||
uuid_t ts_tx_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we'll need to change this field, as well as te_tx_rl and te_tx_rh, from uuid_t to dtx_id, because TXs opened by the same thread share the same UUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do because the hlc guarantees that two transactions from the same thread will have different epoch, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that'll not always be the case, at least not without adding more restrictions. E.g., imagine the application uses Argobots, like daos_io_server, and opens more than TXs from one pthread. One of these TXs may choose its epoch on server x, getting 100, and read value v with 100; another may choose its epoch on server y, in parallel with the former TX, getting also 100, and write value v with 100. The latter should be rejected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would they use the same uuid ? Is the uuid not chosen on the leader that chooses the epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Their UUIDs are generated by the same client thread.
if (ts_set->ts_max_type > VOS_TS_TYPE_OBJ) | ||
break; | ||
case VOS_TS_TYPE_DKEY: | ||
high_mask |= VOS_TS_READ_DKEY_CHILD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fall-through and the _CHILD are redundant, are they not? Just want to make sure it's intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, they are but if we go straight to DKEY, it needs to be set
Signed-off-by: Jeff Olivier <[email protected]>
f6b02c6
Test stage Build on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/263/log |
Test stage Build on CentOS 7 debug completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/275/log |
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/292/log |
Test stage Build on CentOS 7 release completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/295/log |
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/265/log |
Signed-off-by: Jeff Olivier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/22/execution/node/844/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Add dedup to ignore list Signed-off-by: Jeff Olivier <[email protected]>
toward making negative entries part of the corresponding positive entry rather than their own timestamp cache. Signed-off-by: Jeff Olivier <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some logic can be improved, but not fatal.
if (!dtx_is_valid_handle(dth) || dth->dth_deferred == NULL) | ||
return 0; | ||
|
||
/** Reserve enough space for any deferred actions */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some confused, my understand is that the "dth_deferred[i]" will be used as output parameter for pmemobj_defer_free(), one defer_free() only need one "pobj_action". Since we have already created N (N is the 'dth_modification_cnt') "vos_rsrvd_scm" slots when DTX handle init, then for each slot, we only need to create one "pobj_action", right? Do we really need so large size to hold multiple "pobj_action"? Or I missed anything?
|
||
rsrvd_scm = dru->dru_scm; | ||
if (rsrvd_scm->rs_actv_at >= rsrvd_scm->rs_actv_cnt) | ||
continue; /* Can't really be > but keep it simple */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is redundant, because each sub modification will call defer_free() at most once, "rs_actv_at" will be always zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it modifies multiple akeys, it can happen more than once, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I ignored such case.
On the other hand, if some SCM area is not allocated by current DTX, defer_free() it, whether will it cause issue or not? I will assume that it can work since I am not familiar with the semantics for defer_free().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not cause an issue to defer free something not allocated by (or reserved for) the transaction. However, this case only happens if the reservation has the same epoch and is in the same tx.
if (dru->dru_scm == NULL) | ||
continue; | ||
for (i = 0; i < dth->dth_deferred_cnt; i++) { | ||
rsrvd_scm = dth->dth_deferred[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to find out the free slot one by one. Because we have already created N "vos_rsrvd_scm" slots when init the DTX handle, so just directly use "rsrvd_scm = dth->dth_deferred[dth_op_seq];", nobody will share using such slot with current sub-modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I guess you can never have more undo's than the number of submodifications.
Simplify the timestamp cache some by consolidating APIs. There is now one API for checking for conflicts and one API for updating the read timestamps. This is preparation for adding more read timestamp updates for iteration and query APIs and for adding epoch uncertainty checks. More simplification is probably needed but this is a step in that direction. Update the mvcc test to enable same transaction tests. A few changes were required for this to work *Change the code so that it only calls vts_dtx_begin/end/commit/abort once per transaction and uses sequence number. *Disable 5 cases that require the punch model change patch that is blocked by a rebuild bug. *Fixes a bug that was clearing the uuids in the timestamp cache *Disable cases where the first operation is a failing conditional update or punch in the same transaction. These cases require more work in *VOS to support *Encapsulate vos_publish/cancel inside of vos_tx_end *Defer free for same transaction overwrites *Add full dtx_id to timestamps for comparisons Signed-off-by: Jeff Olivier <[email protected]>
Simplify the timestamp cache some by consolidating APIs. There is now one
API for checking for conflicts and one API for updating the read timestamps.
This is preparation for adding more read timestamp updates for iteration
and query APIs and for adding epoch uncertainty checks. More simplification
is probably needed but this is a step in that direction.
Update the mvcc test to enable same transaction tests. A few changes were
required for this to work
per transaction and uses sequence number.
blocked by a rebuild bug.
or punch in the same transaction. These cases require more work in
VOS to support
Signed-off-by: Jeff Olivier [email protected]