-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3724 ensure import rollback works correctly #6657
Conversation
Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index. - fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it. - Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket. - modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID. - Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end. Cleanup: - move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an embedded struct in DCPDest. - panic if DCPCommon.Rollback is called. This should never happen given the design of cbgt.Dest, but if it did, the rollback would not work. This is different from the old behavior where it would not rollback and potentially enter a rollback loop if this function was called. - remove maxVbNo from NewDCPDest in effort to shorten the arguments since the flag can be inferred from base.Bucket. - Create some public structs to make tests easier: - ShardedImportDCPMetadata - CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes
base/dcp_client.go
Outdated
@@ -512,7 +509,7 @@ func (dc *DCPClient) openStreamRequest(vbID uint16) error { | |||
} | |||
|
|||
op, openErr := dc.agent.OpenStream(vbID, | |||
memd.DcpStreamAddFlagActiveOnly, | |||
memd.DcpStreamAddFlagActiveOnly|memd.DcpStreamAddFlagStrictVBUUID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want this? It sounds like the behaviour when it's not set (ignore the vbuuid if startSeq is 0) is what we prefer. I don't know if we need to have our DCPClient match cbgt handling here?
base/dcp_dest.go
Outdated
|
||
// TODO: Metadata store? | ||
metadataStore := bucket.DefaultDataStore() | ||
dcpCommon, err := NewDCPCommon(ctx, callback, bucket, metadataStore, maxVbNo, persistCheckpoints, dcpStats, feedID, checkpointPrefix, metaKeys) | ||
dcpCommon, err := NewDCPCommon(ctx, callback, bucket, metadataStore, persistCheckpoints, dcpStats, feedID, checkpointPrefix, metaKeys, rollback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we moving where maxVbNo is computed in this PR for any reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was to make this call stack less complicated because we can and should infer it from Bucket
. It's not necessary in this PR but I thought when I'm adding argument 11, I might want to see if they are necessary.
db/import_pindex.go
Outdated
if err != nil { | ||
base.ErrorfCtx(ctx, "Error creating NewImportDest during NewImportPIndexImpl: %v", err) | ||
} | ||
return nil, importDest, err | ||
} | ||
return newImportPIndexImpl | ||
} | ||
func OpenImportPIndexImpl(indexType, path string, restart func()) (cbgt.PIndexImpl, cbgt.Dest, error) { | ||
|
||
// openImportPIndexImpl is required to have an implementation from cbgt.PIndexImplType.Open. When this function fails, PIndexImplType will fall back to using PIndexImplType.OpenUsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? (that Open falls back to OpenUsing). I don't see that happening in the current cbgt - I don't think cbgt uses Open at all any more (OpenPIndexImpl is only called by tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue this is a bug in cbgt, but it is required by https://github.com/couchbaselabs/cbgt/blob/master/manager_planner.go#L294
It should check OpenUsing
instead of Open
here.
base/dcp_common.go
Outdated
@@ -58,10 +58,12 @@ type DCPCommon struct { | |||
feedID string // Unique feed ID, used for logging | |||
loggingCtx context.Context // Logging context, prefixes feedID | |||
checkpointPrefix string // DCP checkpoint key prefix | |||
janitorRollback func() // This function will trigger a janitor_pindex_rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this should be stored on DCPDest, since it's a cbgt-specific artifact, but I guess DCPDest is actually the only usage of DCPCommon now that we've switched the other feeds over to DCPClient?
* CBG-3724 ensure import rollback works correctly Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index. - fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it. - Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket. - modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID. - Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end. Cleanup: - move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an embedded struct in DCPDest. - panic if DCPCommon.Rollback is called. This should never happen given the design of cbgt.Dest, but if it did, the rollback would not work. This is different from the old behavior where it would not rollback and potentially enter a rollback loop if this function was called. - remove maxVbNo from NewDCPDest in effort to shorten the arguments since the flag can be inferred from base.Bucket. - Create some public structs to make tests easier: - ShardedImportDCPMetadata - CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes * work around MB-60564 * Change message * Make remove raceproof and improve comment * Reduce scope of changes for easier backport * Address comments to make code simpler
…igration cases (#6676) * CBG-3724 ensure import rollback works correctly (#6657) * CBG-3724 ensure import rollback works correctly Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index. - fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it. - Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket. - modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID. - Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end. Cleanup: - move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an embedded struct in DCPDest. - panic if DCPCommon.Rollback is called. This should never happen given the design of cbgt.Dest, but if it did, the rollback would not work. This is different from the old behavior where it would not rollback and potentially enter a rollback loop if this function was called. - remove maxVbNo from NewDCPDest in effort to shorten the arguments since the flag can be inferred from base.Bucket. - Create some public structs to make tests easier: - ShardedImportDCPMetadata - CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes * work around MB-60564 * Change message * Make remove raceproof and improve comment * Reduce scope of changes for easier backport * Address comments to make code simpler * updates to remove problematic code * Update base/dcp_dest.go Co-authored-by: Tor Colvin <[email protected]> --------- Co-authored-by: Tor Colvin <[email protected]>
* CBG-3724 ensure import rollback works correctly Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index. - fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it. - Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket. - modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID. - Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end. Cleanup: - move RollbackEx function from DCPDest to DCPCommon. DCPCommon is an embedded struct in DCPDest. - panic if DCPCommon.Rollback is called. This should never happen given the design of cbgt.Dest, but if it did, the rollback would not work. This is different from the old behavior where it would not rollback and potentially enter a rollback loop if this function was called. - remove maxVbNo from NewDCPDest in effort to shorten the arguments since the flag can be inferred from base.Bucket. - Create some public structs to make tests easier: - ShardedImportDCPMetadata - CBGCfgIndexDefs, CBGTIndexNodeDefsKnown, CBGTIndexNodeDefsWanted, CBGTCfgPlanPIndexes * work around MB-60564 * Change message * Make remove raceproof and improve comment * Reduce scope of changes for easier backport * Address comments to make code simpler
Major fix: pass a function to call cbgt.JANITOR_ROLLBACK_PINDEX when creating a pindex. This is necessary to ensure cbgt.Mgr will actually generate the index.
fix the TestImportRollback test which was giving a false pass before because the cbgt.Mgr was getting a kick op when the nodes were rebalanced. In order for rollback to work, the cbgt.Dest RollbackEx function has to force cgbt.Mgr to actually update the PIndex. A cbgt.JANITOR_ROLLBACK_PINDEX op will force the recreation of a PIndex. This test was broken before adding code to it.
Enhance the TestImportRollback to detect a rollback with invalid FailoverLogs. This simulates a situation where the checkpoints have been migrated from a separate bucket.
modify TestContinuousRollback to also test with bad vbuuid FailoverEntry. This test actually would pass, because Sync Gateway, unlike cbgt, did not pass DcpStreamAddFlagStrictVBUUID. DcpStreamAddFlagStrictVBUUID forces a rollback in the case where StartSeqNo=0 vbuuid=invalid whereas without the flag it is treated like StartSeqNo=0 vbuuid=0. Changed the code to match cbgt behavior adding DcpStreamAddFlagStrictVBUUID.
Added a separate flag in DCPClient to detect active vBuckets, which are the number of vBuckets that actually have a connection, and are not stuck in a connecting state. This differs from the existing activeVbuckets which is set from the time of NewDCPClient to when the streams end. In a continuous feed, the streams should never end.
Cleanup:
Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/2278/