Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-3114 vos: Consolidate timestamp cache APIs #3200

Merged
merged 47 commits into from
Aug 19, 2020
Merged

Conversation

jolivier23
Copy link
Contributor

@jolivier23 jolivier23 commented Aug 4, 2020

Simplify the timestamp cache some by consolidating APIs. There is now one
API for checking for conflicts and one API for updating the read timestamps.
This is preparation for adding more read timestamp updates for iteration
and query APIs and for adding epoch uncertainty checks. More simplification
is probably needed but this is a step in that direction.

Update the mvcc test to enable same transaction tests. A few changes were
required for this to work

  1. Change the code so that it only calls vts_dtx_begin/end/commit/abort once
    per transaction and uses sequence number.
  2. Disable 5 cases that require the punch model change patch that is
    blocked by a rebuild bug.
  3. Fixes a bug that was clearing the uuids in the timestamp cache
  4. Disable cases where the first operation is a failing conditional update
    or punch in the same transaction. These cases require more work in
    VOS to support
  5. Encapsulate vos_publish/cancel inside of vos_tx_end
  6. Defer free for same transaction overwrites
  7. Add full dtx_id to timestamps for comparisons

Signed-off-by: Jeff Olivier [email protected]

Extents covered by a minor epoch punch should not be
visible.  This modifies the handling of prior punches
so that the minor epoch is accounted for.

Modifies replay punches so that they use max minor epoch - 1
and non-transactional updates to use max minor epoch.  This
way rebuild can replay the punch using only the major
epoch and the existing flag.

Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Jeff Olivier <[email protected]>
a punch/update to execute in non-transactional mode with separate
minor epochs

Signed-off-by: Jeff Olivier <[email protected]>
Simplify the timestamp cache some by consolidating APIs.  There is now one
API for checking for conflicts and one API for updating the read timestamps.
This is preparation for adding more read timestamp updates for iteration
and query APIs and for adding epoch uncertainty checks.  More simplification
is probably needed but this is a step in that direction.

Signed-off-by: Jeff Olivier <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Signed-off-by: Jeff Olivier <[email protected]>
@jolivier23 jolivier23 requested a review from ryon-jensen August 5, 2020 14:03
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/1/execution/node/803/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

NiuYawei
NiuYawei previously approved these changes Aug 7, 2020
liw
liw previously approved these changes Aug 7, 2020
src/vos/vos_ts.h Outdated
/** Write level for the set */
uint16_t ts_wr_level;
/** Max type */
uint16_t ts_max_type;
/** Transaction that owns the set */
uuid_t ts_tx_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we'll need to change this field, as well as te_tx_rl and te_tx_rh, from uuid_t to dtx_id, because TXs opened by the same thread share the same UUID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do because the hlc guarantees that two transactions from the same thread will have different epoch, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that'll not always be the case, at least not without adding more restrictions. E.g., imagine the application uses Argobots, like daos_io_server, and opens more than TXs from one pthread. One of these TXs may choose its epoch on server x, getting 100, and read value v with 100; another may choose its epoch on server y, in parallel with the former TX, getting also 100, and write value v with 100. The latter should be rejected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would they use the same uuid ? Is the uuid not chosen on the leader that chooses the epoch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their UUIDs are generated by the same client thread.

if (ts_set->ts_max_type > VOS_TS_TYPE_OBJ)
break;
case VOS_TS_TYPE_DKEY:
high_mask |= VOS_TS_READ_DKEY_CHILD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fall-through and the _CHILD are redundant, are they not? Just want to make sure it's intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, they are but if we go straight to DKEY, it needs to be set

ryon-jensen
ryon-jensen previously approved these changes Aug 7, 2020
@jolivier23 jolivier23 dismissed stale reviews from ryon-jensen, liw, and NiuYawei via f6b02c6 August 7, 2020 14:00
@jolivier23 jolivier23 requested review from liw and NiuYawei August 7, 2020 14:00
@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/263/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 debug completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/275/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/292/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 release completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/295/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/21/execution/node/265/log

Signed-off-by: Jeff Olivier <[email protected]>
@jolivier23 jolivier23 requested review from liw and Nasf-Fan August 18, 2020 14:27
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Nasf-Fan
Nasf-Fan previously approved these changes Aug 18, 2020
ryon-jensen
ryon-jensen previously approved these changes Aug 18, 2020
@daosbuild1
Copy link
Collaborator

Test stage Functional_Hardware_Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3200/22/execution/node/844/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Add dedup to ignore list

Signed-off-by: Jeff Olivier <[email protected]>
toward making negative entries part of the corresponding positive entry
rather than their own timestamp cache.

Signed-off-by: Jeff Olivier <[email protected]>
@jolivier23 jolivier23 dismissed stale reviews from ryon-jensen and Nasf-Fan via 5b8cd98 August 18, 2020 21:59
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Contributor

@Nasf-Fan Nasf-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some logic can be improved, but not fatal.

if (!dtx_is_valid_handle(dth) || dth->dth_deferred == NULL)
return 0;

/** Reserve enough space for any deferred actions */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some confused, my understand is that the "dth_deferred[i]" will be used as output parameter for pmemobj_defer_free(), one defer_free() only need one "pobj_action". Since we have already created N (N is the 'dth_modification_cnt') "vos_rsrvd_scm" slots when DTX handle init, then for each slot, we only need to create one "pobj_action", right? Do we really need so large size to hold multiple "pobj_action"? Or I missed anything?


rsrvd_scm = dru->dru_scm;
if (rsrvd_scm->rs_actv_at >= rsrvd_scm->rs_actv_cnt)
continue; /* Can't really be > but keep it simple */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is redundant, because each sub modification will call defer_free() at most once, "rs_actv_at" will be always zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it modifies multiple akeys, it can happen more than once, no?

Copy link
Contributor

@Nasf-Fan Nasf-Fan Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I ignored such case.
On the other hand, if some SCM area is not allocated by current DTX, defer_free() it, whether will it cause issue or not? I will assume that it can work since I am not familiar with the semantics for defer_free().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not cause an issue to defer free something not allocated by (or reserved for) the transaction. However, this case only happens if the reservation has the same epoch and is in the same tx.

if (dru->dru_scm == NULL)
continue;
for (i = 0; i < dth->dth_deferred_cnt; i++) {
rsrvd_scm = dth->dth_deferred[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to find out the free slot one by one. Because we have already created N "vos_rsrvd_scm" slots when init the DTX handle, so just directly use "rsrvd_scm = dth->dth_deferred[dth_op_seq];", nobody will share using such slot with current sub-modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess you can never have more undo's than the number of submodifications.

@jolivier23 jolivier23 merged commit c645966 into master Aug 19, 2020
@jolivier23 jolivier23 deleted the jvolivie/consolidate branch August 19, 2020 19:45
SABollinger pushed a commit that referenced this pull request Nov 11, 2020
Simplify the timestamp cache some by consolidating APIs. There is now one
API for checking for conflicts and one API for updating the read timestamps.
This is preparation for adding more read timestamp updates for iteration
and query APIs and for adding epoch uncertainty checks. More simplification
is probably needed but this is a step in that direction.

Update the mvcc test to enable same transaction tests. A few changes were
required for this to work

*Change the code so that it only calls vts_dtx_begin/end/commit/abort once
per transaction and uses sequence number.
*Disable 5 cases that require the punch model change patch that is
blocked by a rebuild bug.
*Fixes a bug that was clearing the uuids in the timestamp cache
*Disable cases where the first operation is a failing conditional update
or punch in the same transaction. These cases require more work in
*VOS to support
*Encapsulate vos_publish/cancel inside of vos_tx_end
*Defer free for same transaction overwrites
*Add full dtx_id to timestamps for comparisons

Signed-off-by: Jeff Olivier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants