-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-5930: Support snapshot feature without an automatic retry #112
Conversation
…m Copy (taking a snapshot)
event.getVgtid().toString()); | ||
if (newVgtid.getRawVgtid().getShardGtidsList().stream().findFirst().map(s -> s.getTablePKsCount() == 0).orElse(false) | ||
&& event.getVgtid().getShardGtidsList().stream().findFirst().map(s -> 0 < s.getTablePKsCount()).orElse(false)) { | ||
LOGGER.info("Received more than one VGTID events during a copy operation and the previous one is {}. Using the latest: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual log is:
2022-12-12 18:57:05,028 INFO || Received more than one VGTID events during a copy operation and the previous one is [{"keyspace":"test_unsharded_keyspace","shard":"0","gtid":"MySQL56/4edb14ad-79c3-11ed-a748-0242ac110002:1-947"}]. Using the latest: shard_gtids {
keyspace: "test_unsharded_keyspace"
shard: "0"
gtid: "MySQL56/4edb14ad-79c3-11ed-a748-0242ac110002:1-947"
table_p_ks {
table_name: "t1"
lastpk {
rows {
lengths: 1
values: "1"
}
}
}
}
@yoheimuta Hi, thanks for the PR. It looks relatively easy to be done. For the sake of cross-connector consistency, could you please take a look at Also could you please make sure you run the build locally as it formats the source code too. |
Thank you for telling me! |
…h the other connectors
private void startConnector(Function<Configuration.Builder, Configuration.Builder> customConfig, | ||
boolean hasMultipleShards, boolean offsetStoragePerTask, | ||
int numTasks, int gen, int prevNumTasks, String tableInclude, | ||
VitessConnectorConfig.SnapshotMode snapshotMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[memo] When you give null
as snapshotMode here, the connector uses the default snapshotMode (INITIAL).
/** | ||
* The set of predefined SnapshotMode options or aliases. | ||
*/ | ||
public static enum SnapshotMode implements EnumeratedValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -210,6 +278,17 @@ public class VitessConnectorConfig extends RelationalDatabaseConnectorConfig { | |||
+ "Once we persist the new offsets in offset storage using new partition key " | |||
+ "based on current <numTasks> and <gen>, we will no longer read prev.num.tasks param"); | |||
|
|||
public static final Field SNAPSHOT_MODE = Field.create("snapshot.mode") | |||
.withDisplayName("Snapshot mode") | |||
.withEnum(SnapshotMode.class, SnapshotMode.INITIAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other connectors set the default to INITIAL.
@@ -296,6 +376,9 @@ public String getShard() { | |||
} | |||
|
|||
public String getGtid() { | |||
if (getSnapshotMode() == SnapshotMode.INITIAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotMode would be Initial be default. Would this mean by default we will pull using gtid="" which is full table scan, this would be a big performance impact on vitess side if people are not careful about their configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the caller of getGtid is VitessReplicationConnection.defaultVGtid(). There are three code paths in defaultVGtid() (offsetStoragePerTask mode, shard = "", shard is not empty) config.getGtid() only covers one of the paths, can the gtid="" also works for the other two paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HenryCaiHaiying I don't think this is wrong. Other connectors also execute full table scans upon connector start as they do the snapshot of existing content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the gtid="" also works for the other two paths?
shard = ""
This case is not supported by the current Vitess now. I reported it to the Vitess team.
The fix will be included in Vitess 16 by vitessio/vitess#11886.
So, I added a warning output and a test for now 70a89be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offsetStoragePerTask mode
Thanks! offsetStoragePerTask mode slipped my mind.
Maybe it should be supported. I'm looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns on the defaulting behavior. The defaulting snapshot pull will change the existing system's behavior. The current default is to pull from the tail of the BinLog, now when people upgrade to the newer version of debezium and if they are not careful, it becomes pull from the head of the BinLog with a big performance impact.
The other issue is the GTID in the config is being ignored when the snapshot mode is Initial (which is the default). Sometimes we use that GTID config parameter to have the connector to rewind to an old position in the binlog, now this function is lost if we forgot to change the snapshot mode. I think the system should honor the GTID param if it's set, only use "" if gtd is not present (flip your current fallback logic)
I kind of like the first version of the PR where the user will explicitly set the GTID param to "" to indicate he wants to pull from the beginning of the binlog. This wouldn't cause confusions or performance impact if not careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offsetStoragePerTask mode
Maybe it should be supported. I'm looking into it.
I added a commit 822a58d to support it.
@yoheimuta One key question, could you please add a test that will verify what will happen after connector restart? I am a bit afraid that if the state of snapshot being completed is not recorded in offsets then it will be reexcuted again but I might be wrong. |
…fig doesn't specify shards
@jpechane Thanks! It makes sense. And I will also add a test that will start copying without a table include filter. |
…after the connector finished it and restarted
…s and more than one table has no rows
@@ -352,6 +370,9 @@ public static Vgtid defaultVgtid(VitessConnectorConfig config) { | |||
} | |||
else { | |||
if (config.getShard() == null || config.getShard().isEmpty()) { | |||
if (config.getGtid().equals("")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although vitess doesn't support empty shard and empty gtid position, there is a workaround possible in debezium connector side. We can get all the shards from vtgate and replace the empty shard param with all shards and continue (this was actually what was happening in VtGate code when you don't specify shard). VitessConnector.getVitessShards() can get you all the shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds nice!
I was slanted toward that the connector side code was simpler and we would wait for the fix in Vitess.
But, given that this use case is recommended at the document and the current code has VitessConnector.getVitessShards(),
I changed my mind that it's worth implementing the workaround now.
So, I'm getting to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…fsetStoragePerTask is true
… the config doesn't specify shards" This reverts commit 70a89be.
…oth shard and gtid are empty
@@ -150,7 +150,7 @@ public List<Map<String, String>> taskConfigs(int maxTasks, List<String> shards) | |||
} | |||
if (gtidStr == null) { | |||
LOGGER.warn("No previous gtid found either for shard: {}, fallback to current", shard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the order of line 152 and 153, so you can print out the gtidStr value in the LOGGER.warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I overlooked it.
Added 5ea8780.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me.
ex. 2022-12-20 14:57:53,936 WARN VitessConnectorIT||engine No previous gtid found either for shard: 0, fallback to '' [io.debezium.connector.vitess.VitessConnector]
@yoheimuta The change looks good to me overall, thanks for adding support for empty shard and OffsetStoragePerTask mode. I have some concerns on changing the initial pull behavior to snapshot the whole table instead of from tail because of the performance impact , but if you guys feel this is consistent with the other connectors, you can move forward but the release note has to clearly specify this changing behavior and the potential performance impact on the database side. |
@HenryCaiHaiying According to the test the initial snapshot should not be executed after connector upgrade. In that case the current installations are safe and the new installations will have the same behaviour as the other connectors. |
@HenryCaiHaiying @jpechane Thank you for your careful review! @jpechane So, should I provide a docs update PR to the main repo as the next step? |
@yoheimuta Thanks for the contribution. It was nice someone can push though the changes on both debezium and vitess community. |
@yoheimuta Applied, thanks! yes, please prepare the docs PR too. |
See corresponding doc PR: debezium/debezium#4158 |
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
Enable support for enums during vstream copy phase. There are two reasons that the connector does not handle `enum` for PSDB branches. 1. The upstream debezium-connector-vitess simply does not support `enum` during the VStream copy phase. It tries to cast the row value to an integer, but the value is a string. It seems support for `enum` landed in 2021 debezium#20, and support for snapshots (VStream Copy) landed in 2022 debezium#112, without taking the former into account. This is easily fixed by finding finding the index of the string value in the list of values obtained from `column_type` during the schema discovery phase at the beginning of the VStream. 2. However, this isn't working on some PSDB branches which don't have the fix vitessio/vitess#13045 for this bug vitessio/vitess#12981. Fixable by backporting the bugfix or upgrading those branches. Signed-off-by: Max Englander <[email protected]>
ref. https://issues.redhat.com/browse/DBZ-5930
This PR adjusts the event state machine in gRPC responses to support the snapshot feature.
The following three cases differ from the streaming and should be taken care of.