Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-5930: Support snapshot feature without an automatic retry #112

Merged
merged 13 commits into from
Dec 21, 2022

Conversation

yoheimuta
Copy link
Contributor

@yoheimuta yoheimuta commented Dec 13, 2022

ref. https://issues.redhat.com/browse/DBZ-5930

This PR adjusts the event state machine in gRPC responses to support the snapshot feature.

The following three cases differ from the streaming and should be taken care of.

  • Duplicate VGTID when the VStream Copy starts
  • Duplicate BEGIN when the VStream Copy starts and the table has no rows
  • Duplicate BEGIN when the VStream Copy starts and more than one table has no rows

event.getVgtid().toString());
if (newVgtid.getRawVgtid().getShardGtidsList().stream().findFirst().map(s -> s.getTablePKsCount() == 0).orElse(false)
&& event.getVgtid().getShardGtidsList().stream().findFirst().map(s -> 0 < s.getTablePKsCount()).orElse(false)) {
LOGGER.info("Received more than one VGTID events during a copy operation and the previous one is {}. Using the latest: {}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual log is:

2022-12-12 18:57:05,028 INFO   ||  Received more than one VGTID events during a copy operation and the previous one is [{"keyspace":"test_unsharded_keyspace","shard":"0","gtid":"MySQL56/4edb14ad-79c3-11ed-a748-0242ac110002:1-947"}]. Using the latest: shard_gtids {
  keyspace: "test_unsharded_keyspace"
  shard: "0"
  gtid: "MySQL56/4edb14ad-79c3-11ed-a748-0242ac110002:1-947"
  table_p_ks {
    table_name: "t1"
    lastpk {
      rows {
        lengths: 1
        values: "1"
      }
    }
  }
}

@jpechane
Copy link
Contributor

jpechane commented Dec 14, 2022

@yoheimuta Hi, thanks for the PR. It looks relatively easy to be done. For the sake of cross-connector consistency, could you please take a look at snapshot.mode config option used by other connectors? If possible I'd like to drive/configure snapshotting via it rather than with a special value of GTID setting.

Also could you please make sure you run the build locally as it formats the source code too.

@yoheimuta
Copy link
Contributor Author

Thank you for telling me!
I'm looking into snapshot.mode and formatting.

private void startConnector(Function<Configuration.Builder, Configuration.Builder> customConfig,
boolean hasMultipleShards, boolean offsetStoragePerTask,
int numTasks, int gen, int prevNumTasks, String tableInclude,
VitessConnectorConfig.SnapshotMode snapshotMode)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[memo] When you give null as snapshotMode here, the connector uses the default snapshotMode (INITIAL).

@yoheimuta
Copy link
Contributor Author

@jpechane Hi, I fixed these two points.

Could you please review this PR?

/**
* The set of predefined SnapshotMode options or aliases.
*/
public static enum SnapshotMode implements EnumeratedValue {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -210,6 +278,17 @@ public class VitessConnectorConfig extends RelationalDatabaseConnectorConfig {
+ "Once we persist the new offsets in offset storage using new partition key "
+ "based on current <numTasks> and <gen>, we will no longer read prev.num.tasks param");

public static final Field SNAPSHOT_MODE = Field.create("snapshot.mode")
.withDisplayName("Snapshot mode")
.withEnum(SnapshotMode.class, SnapshotMode.INITIAL)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other connectors set the default to INITIAL.

@@ -296,6 +376,9 @@ public String getShard() {
}

public String getGtid() {
if (getSnapshotMode() == SnapshotMode.INITIAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SnapshotMode would be Initial be default. Would this mean by default we will pull using gtid="" which is full table scan, this would be a big performance impact on vitess side if people are not careful about their configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the caller of getGtid is VitessReplicationConnection.defaultVGtid(). There are three code paths in defaultVGtid() (offsetStoragePerTask mode, shard = "", shard is not empty) config.getGtid() only covers one of the paths, can the gtid="" also works for the other two paths?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryCaiHaiying I don't think this is wrong. Other connectors also execute full table scans upon connector start as they do the snapshot of existing content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the gtid="" also works for the other two paths?

shard = ""

This case is not supported by the current Vitess now. I reported it to the Vitess team.
The fix will be included in Vitess 16 by vitessio/vitess#11886.

So, I added a warning output and a test for now 70a89be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsetStoragePerTask mode

Thanks! offsetStoragePerTask mode slipped my mind.
Maybe it should be supported. I'm looking into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns on the defaulting behavior. The defaulting snapshot pull will change the existing system's behavior. The current default is to pull from the tail of the BinLog, now when people upgrade to the newer version of debezium and if they are not careful, it becomes pull from the head of the BinLog with a big performance impact.

The other issue is the GTID in the config is being ignored when the snapshot mode is Initial (which is the default). Sometimes we use that GTID config parameter to have the connector to rewind to an old position in the binlog, now this function is lost if we forgot to change the snapshot mode. I think the system should honor the GTID param if it's set, only use "" if gtd is not present (flip your current fallback logic)

I kind of like the first version of the PR where the user will explicitly set the GTID param to "" to indicate he wants to pull from the beginning of the binlog. This wouldn't cause confusions or performance impact if not careful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsetStoragePerTask mode

Maybe it should be supported. I'm looking into it.

I added a commit 822a58d to support it.

@jpechane
Copy link
Contributor

@yoheimuta One key question, could you please add a test that will verify what will happen after connector restart? I am a bit afraid that if the state of snapshot being completed is not recorded in offsets then it will be reexcuted again but I might be wrong.
Still it is is important to test it.

@yoheimuta
Copy link
Contributor Author

yoheimuta commented Dec 16, 2022

@jpechane Thanks! It makes sense.
I'm adding a test that verifies what will happen after the connector restart.

And I will also add a test that will start copying without a table include filter.
I found out this case sent a different event sequence.

@yoheimuta
Copy link
Contributor Author

yoheimuta commented Dec 16, 2022

@jpechane Hi, I added the test: 689a2c5 f3d5551.

@@ -352,6 +370,9 @@ public static Vgtid defaultVgtid(VitessConnectorConfig config) {
}
else {
if (config.getShard() == null || config.getShard().isEmpty()) {
if (config.getGtid().equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although vitess doesn't support empty shard and empty gtid position, there is a workaround possible in debezium connector side. We can get all the shards from vtgate and replace the empty shard param with all shards and continue (this was actually what was happening in VtGate code when you don't specify shard). VitessConnector.getVitessShards() can get you all the shards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds nice!
I was slanted toward that the connector side code was simpler and we would wait for the fix in Vitess.

But, given that this use case is recommended at the document and the current code has VitessConnector.getVitessShards(), I changed my mind that it's worth implementing the workaround now.
So, I'm getting to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two commits 5c7a7f6 3043a9c to take snapshots from all shards.

@@ -150,7 +150,7 @@ public List<Map<String, String>> taskConfigs(int maxTasks, List<String> shards)
}
if (gtidStr == null) {
LOGGER.warn("No previous gtid found either for shard: {}, fallback to current", shard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the order of line 152 and 153, so you can print out the gtidStr value in the LOGGER.warn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I overlooked it.
Added 5ea8780.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

ex. 2022-12-20 14:57:53,936 WARN   VitessConnectorIT||engine  No previous gtid found either for shard: 0, fallback to ''   [io.debezium.connector.vitess.VitessConnector]
@HenryCaiHaiying
Copy link
Contributor

@yoheimuta The change looks good to me overall, thanks for adding support for empty shard and OffsetStoragePerTask mode. I have some concerns on changing the initial pull behavior to snapshot the whole table instead of from tail because of the performance impact , but if you guys feel this is consistent with the other connectors, you can move forward but the release note has to clearly specify this changing behavior and the potential performance impact on the database side.

@jpechane
Copy link
Contributor

@HenryCaiHaiying According to the test the initial snapshot should not be executed after connector upgrade. In that case the current installations are safe and the new installations will have the same behaviour as the other connectors.
Yes, it will be mentioned in release notes under breaking changes section.

@yoheimuta
Copy link
Contributor Author

@HenryCaiHaiying @jpechane Thank you for your careful review!
I agree with clearly specifying this breaking change behavior.

@jpechane So, should I provide a docs update PR to the main repo as the next step?

@HenryCaiHaiying
Copy link
Contributor

@yoheimuta Thanks for the contribution. It was nice someone can push though the changes on both debezium and vitess community.

@jpechane jpechane merged commit 61903a1 into debezium:main Dec 21, 2022
@jpechane
Copy link
Contributor

@yoheimuta Applied, thanks! yes, please prepare the docs PR too.

@yoheimuta yoheimuta deleted the support-snapshot branch December 22, 2022 02:37
@yoheimuta
Copy link
Contributor Author

See corresponding doc PR: debezium/debezium#4158

maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 16, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 16, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 18, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 18, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 18, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 21, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 23, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
maxenglander added a commit to planetscale/debezium-connector-planetscale that referenced this pull request Oct 25, 2024
Enable support for enums during vstream copy phase.

There are two reasons that the connector does not handle `enum` for PSDB branches.

 1. The upstream debezium-connector-vitess simply does not support
    `enum` during the VStream copy phase. It tries to cast the row value
    to an integer, but the value is a string. It seems support for
    `enum` landed in 2021
    debezium#20, and
    support for snapshots (VStream Copy) landed in 2022
    debezium#112,
    without taking the former into account. This is easily fixed by
    finding finding the index of the string
    value in the list of values obtained from `column_type` during the
    schema discovery phase at the beginning of the VStream.
 2. However, this isn't working on some PSDB branches which don't have
    the fix vitessio/vitess#13045 for this bug
    vitessio/vitess#12981. Fixable by
    backporting the bugfix or upgrading those branches.

Signed-off-by: Max Englander <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants