Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support concurrent apply dml binlog #1285

Closed
wants to merge 11 commits into from

Conversation

shaohk
Copy link
Contributor

@shaohk shaohk commented Jun 12, 2023

Description

This PR is support concurrency apply dml binlog to _gho table.

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

@shaohk shaohk changed the title Feat concurrent apply dml binlog feat: support concurrent apply dml binlog Aug 2, 2023
@morgo
Copy link
Contributor

morgo commented Aug 18, 2023

I believe this is only safe to do if you verify that the unique key that you are using is memory-comparable. i.e. a VARCHAR(255) PK will have collations. So from MySQL's perspective 'A' == 'a', but in the go-lang code they will have two entries in the map/will be treated as different keys.

shaohoukun added 3 commits October 7, 2023 11:19
… operations is restricted.、

Concurrency in applying binlog is supported when determining the unique index column of the chunk
data. Only when all the column types in the unique index are int is concurrency allowed. If any
column has a non-int type, the concurrency is set to 1.
@shaohk
Copy link
Contributor Author

shaohk commented Oct 7, 2023

I believe this is only safe to do if you verify that the unique key that you are using is memory-comparable. i.e. a VARCHAR(255) PK will have collations. So from MySQL's perspective 'A' == 'a', but in the go-lang code they will have two entries in the map/will be treated as different keys.

Yes, U R right, I have added a restriction that only allows concurrent operations when the unique index column of the chunk data is of type 'int'; otherwise, concurrency is not allowed.

@dnovitski
Copy link

dnovitski commented Oct 9, 2023

Am looking with interest to this PR as our migrations can only progress at night due to heavy write load during the day, even though our database has more than enough capacity to perform the DMLs from the binlogs concurrently instead of 1-by-1.

Would be great if we can also progress during daytime with this.

@shaohk Any progress or help needed?

@shaohk
Copy link
Contributor Author

shaohk commented Oct 11, 2023

Am looking with interest to this PR as our migrations can only progress at night due to heavy write load during the day, even though our database has more than enough capacity to perform the DMLs from the binlogs concurrently instead of 1-by-1.

Would be great if we can also progress during daytime with this.

@shaohk Any progress or help needed?

@dnovitski Are you sure that parallel replay of binlog events can meet your needs? From what you've described, there's heavy write activity during the day, so tasks can only run at night. Does that mean when the tasks run at night, serial replay of binlog events can't catch up with the binlog? So, you're considering parallel replay?

@dnovitski
Copy link

@dnovitski Are you sure that parallel replay of binlog events can meet your needs? From what you've described, there's heavy write activity during the day, so tasks can only run at night. Does that mean when the tasks run at night, serial replay of binlog events can't catch up with the binlog? So, you're considering parallel replay?

During the night binlogs are caught up, and so table copy can continue mostly only at night for us. We've been testing with increasing dml-batch-size to 2000 and higher (patching the hardcoded MaxEventsSize value), and increasing this also allows us to occassionally catch up with binlogs even during the daytime. So we expect even better performance with parallel binlog apply, allowing us even to continue the table copy during daytime 100% of the time.

@shaohk
Copy link
Contributor Author

shaohk commented Oct 13, 2023

@dnovitski Are you sure that parallel replay of binlog events can meet your needs? From what you've described, there's heavy write activity during the day, so tasks can only run at night. Does that mean when the tasks run at night, serial replay of binlog events can't catch up with the binlog? So, you're considering parallel replay?

During the night binlogs are caught up, and so table copy can continue mostly only at night for us. We've been testing with increasing dml-batch-size to 2000 and higher (patching the hardcoded MaxEventsSize value), and increasing this also allows us to occassionally catch up with binlogs even during the daytime. So we expect even better performance with parallel binlog apply, allowing us even to continue the table copy during daytime 100% of the time.

Got it.

@shaohk shaohk closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants