Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3724 ensure import rollback works correctly #6657

Merged
merged 6 commits into from
Jan 30, 2024
Merged

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Jan 29, 2024

Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index.

  • fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it.

  • Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket.

  • modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID.

  • Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end.

Cleanup:

  • move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an embedded struct in DCPDest.
  • panic if DCPCommon.Rollback is called. This should never happen given the design of cbgt.Dest, but if it did, the rollback would not work. This is different from the old behavior where it would not rollback and potentially enter a rollback loop if this function was called.
  • remove maxVbNo from NewDCPDest in effort to shorten the arguments since the flag can be inferred from base.Bucket.
  • Create some public structs to make tests easier:
    • ShardedImportDCPMetadata
    • CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when
creating a pindex. This is necessary to ensure cbgt.Mgr will actually
generate the index.

- fix the TestImportRollback test which was giving a false pass before
  because the cbgt.Mgr was getting a kick op when the nodes were
  rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx
  function has to force cgbt.Mgr to actually update the PIndex. A
  cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex.
  This test was broken before adding code to it.
- Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs.
  This simulates a situation where the checkpoints have been migrated
  from a separate bucket.

- modify TestContinuousRollback to also test with bad vbuuid FailoverEntry.
  This test actually would pass, because Sync Gateway, unlike cbgt, did
  not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID
  forces a rollback in the case where StartSeqNo=0 vbuuid=invalid
  whereas without the flag it is treated like StartSeqNo=0 vbuuid=0.
  Changed the code to match cbgt behavior adding
  DcpStreamAddFlagStrictVBUUID.
- Added a separate flag in DCPClient to detect active vBuckets, which
  are the number of vBuckets that actually have a connection, and are
  not stuck in a connecting state. This differs from the existing
  activeVbuckets which is set from the time of NewDCPClient to when the
  streams end. In a continuous feed, the streams should never end.

Cleanup:

- move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an
  embedded struct in DCPDest.
- panic if DCPCommon.Rollback is called. This should never happen given
  the design of cbgt.Dest, but if it did, the rollback would not work.
  This is different from the old behavior where it would not rollback
  and potentially enter a rollback loop if this function was called.
- remove maxVbNo from NewDCPDest in effort to shorten the arguments
  since the flag can be inferred from base.Bucket.
- Create some public structs to make tests easier:
	- ShardedImportDCPMetadata
	- CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes
@@ -512,7 +509,7 @@ func (dc *DCPClient) openStreamRequest(vbID uint16) error {
}

op, openErr := dc.agent.OpenStream(vbID,
memd.DcpStreamAddFlagActiveOnly,
memd.DcpStreamAddFlagActiveOnly|memd.DcpStreamAddFlagStrictVBUUID,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want this? It sounds like the behaviour when it's not set (ignore the vbuuid if startSeq is 0) is what we prefer. I don't know if we need to have our DCPClient match cbgt handling here?

base/dcp_dest.go Outdated

// TODO: Metadata store?
metadataStore := bucket.DefaultDataStore()
dcpCommon, err := NewDCPCommon(ctx, callback, bucket, metadataStore, maxVbNo, persistCheckpoints, dcpStats, feedID, checkpointPrefix, metaKeys)
dcpCommon, err := NewDCPCommon(ctx, callback, bucket, metadataStore, persistCheckpoints, dcpStats, feedID, checkpointPrefix, metaKeys, rollback)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we moving where maxVbNo is computed in this PR for any reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was to make this call stack less complicated because we can and should infer it from Bucket. It's not necessary in this PR but I thought when I'm adding argument 11, I might want to see if they are necessary.

if err != nil {
base.ErrorfCtx(ctx, "Error creating NewImportDest during NewImportPIndexImpl: %v", err)
}
return nil, importDest, err
}
return newImportPIndexImpl
}
func OpenImportPIndexImpl(indexType, path string, restart func()) (cbgt.PIndexImpl, cbgt.Dest, error) {

// openImportPIndexImpl is required to have an implementation from cbgt.PIndexImplType.Open. When this function fails, PIndexImplType will fall back to using PIndexImplType.OpenUsing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? (that Open falls back to OpenUsing). I don't see that happening in the current cbgt - I don't think cbgt uses Open at all any more (OpenPIndexImpl is only called by tests)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue this is a bug in cbgt, but it is required by https://github.com/couchbaselabs/cbgt/blob/master/manager_planner.go#L294

It should check OpenUsing instead of Open here.

@@ -58,10 +58,12 @@ type DCPCommon struct {
feedID string // Unique feed ID, used for logging
loggingCtx context.Context // Logging context, prefixes feedID
checkpointPrefix string // DCP checkpoint key prefix
janitorRollback func() // This function will trigger a janitor_pindex_rollback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this should be stored on DCPDest, since it's a cbgt-specific artifact, but I guess DCPDest is actually the only usage of DCPCommon now that we've switched the other feeds over to DCPClient?

@adamcfraser adamcfraser assigned torcolvin and unassigned adamcfraser Jan 29, 2024
@adamcfraser adamcfraser merged commit 2226e72 into master Jan 30, 2024
30 checks passed
@adamcfraser adamcfraser deleted the CBG-3724 branch January 30, 2024 17:06
gregns1 pushed a commit that referenced this pull request Feb 9, 2024
* CBG-3724 ensure import rollback works correctly

Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when
creating a pindex. This is necessary to ensure cbgt.Mgr will actually
generate the index.

- fix the TestImportRollback test which was giving a false pass before
  because the cbgt.Mgr was getting a kick op when the nodes were
  rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx
  function has to force cgbt.Mgr to actually update the PIndex. A
  cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex.
  This test was broken before adding code to it.
- Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs.
  This simulates a situation where the checkpoints have been migrated
  from a separate bucket.

- modify TestContinuousRollback to also test with bad vbuuid FailoverEntry.
  This test actually would pass, because Sync Gateway, unlike cbgt, did
  not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID
  forces a rollback in the case where StartSeqNo=0 vbuuid=invalid
  whereas without the flag it is treated like StartSeqNo=0 vbuuid=0.
  Changed the code to match cbgt behavior adding
  DcpStreamAddFlagStrictVBUUID.
- Added a separate flag in DCPClient to detect active vBuckets, which
  are the number of vBuckets that actually have a connection, and are
  not stuck in a connecting state. This differs from the existing
  activeVbuckets which is set from the time of NewDCPClient to when the
  streams end. In a continuous feed, the streams should never end.

Cleanup:

- move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an
  embedded struct in DCPDest.
- panic if DCPCommon.Rollback is called. This should never happen given
  the design of cbgt.Dest, but if it did, the rollback would not work.
  This is different from the old behavior where it would not rollback
  and potentially enter a rollback loop if this function was called.
- remove maxVbNo from NewDCPDest in effort to shorten the arguments
  since the flag can be inferred from base.Bucket.
- Create some public structs to make tests easier:
	- ShardedImportDCPMetadata
	- CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes

* work around MB-60564

* Change message

* Make remove raceproof and improve comment

* Reduce scope of changes for easier backport

* Address comments to make code simpler
torcolvin added a commit that referenced this pull request Feb 9, 2024
…igration cases (#6676)

* CBG-3724 ensure import rollback works correctly (#6657)

* CBG-3724 ensure import rollback works correctly

Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when
creating a pindex. This is necessary to ensure cbgt.Mgr will actually
generate the index.

- fix the TestImportRollback test which was giving a false pass before
  because the cbgt.Mgr was getting a kick op when the nodes were
  rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx
  function has to force cgbt.Mgr to actually update the PIndex. A
  cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex.
  This test was broken before adding code to it.
- Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs.
  This simulates a situation where the checkpoints have been migrated
  from a separate bucket.

- modify TestContinuousRollback to also test with bad vbuuid FailoverEntry.
  This test actually would pass, because Sync Gateway, unlike cbgt, did
  not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID
  forces a rollback in the case where StartSeqNo=0 vbuuid=invalid
  whereas without the flag it is treated like StartSeqNo=0 vbuuid=0.
  Changed the code to match cbgt behavior adding
  DcpStreamAddFlagStrictVBUUID.
- Added a separate flag in DCPClient to detect active vBuckets, which
  are the number of vBuckets that actually have a connection, and are
  not stuck in a connecting state. This differs from the existing
  activeVbuckets which is set from the time of NewDCPClient to when the
  streams end. In a continuous feed, the streams should never end.

Cleanup:

- move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an
  embedded struct in DCPDest.
- panic if DCPCommon.Rollback is called. This should never happen given
  the design of cbgt.Dest, but if it did, the rollback would not work.
  This is different from the old behavior where it would not rollback
  and potentially enter a rollback loop if this function was called.
- remove maxVbNo from NewDCPDest in effort to shorten the arguments
  since the flag can be inferred from base.Bucket.
- Create some public structs to make tests easier:
	- ShardedImportDCPMetadata
	- CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes

* work around MB-60564

* Change message

* Make remove raceproof and improve comment

* Reduce scope of changes for easier backport

* Address comments to make code simpler

* updates to remove problematic code

* Update base/dcp_dest.go

Co-authored-by: Tor Colvin <[email protected]>

---------

Co-authored-by: Tor Colvin <[email protected]>
bbrks pushed a commit that referenced this pull request Mar 28, 2024
* CBG-3724 ensure import rollback works correctly

Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when
creating a pindex. This is necessary to ensure cbgt.Mgr will actually
generate the index.

- fix the TestImportRollback test which was giving a false pass before
  because the cbgt.Mgr was getting a kick op when the nodes were
  rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx
  function has to force cgbt.Mgr to actually update the PIndex. A
  cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex.
  This test was broken before adding code to it.
- Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs.
  This simulates a situation where the checkpoints have been migrated
  from a separate bucket.

- modify TestContinuousRollback to also test with bad vbuuid FailoverEntry.
  This test actually would pass, because Sync Gateway, unlike cbgt, did
  not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID
  forces a rollback in the case where StartSeqNo=0 vbuuid=invalid
  whereas without the flag it is treated like StartSeqNo=0 vbuuid=0.
  Changed the code to match cbgt behavior adding
  DcpStreamAddFlagStrictVBUUID.
- Added a separate flag in DCPClient to detect active vBuckets, which
  are the number of vBuckets that actually have a connection, and are
  not stuck in a connecting state. This differs from the existing
  activeVbuckets which is set from the time of NewDCPClient to when the
  streams end. In a continuous feed, the streams should never end.

Cleanup:

- move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an
  embedded struct in DCPDest.
- panic if DCPCommon.Rollback is called. This should never happen given
  the design of cbgt.Dest, but if it did, the rollback would not work.
  This is different from the old behavior where it would not rollback
  and potentially enter a rollback loop if this function was called.
- remove maxVbNo from NewDCPDest in effort to shorten the arguments
  since the flag can be inferred from base.Bucket.
- Create some public structs to make tests easier:
	- ShardedImportDCPMetadata
	- CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes

* work around MB-60564

* Change message

* Make remove raceproof and improve comment

* Reduce scope of changes for easier backport

* Address comments to make code simpler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants