-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] [Connector-V2-Paimon] Data changes are lost when sinking into Paimon using batch mode. #6831
Comments
If you want to use paimon's upsert feature, your source should be mysql-cdc. In addition, the checkpoint.interval parameter is not supported for batch write in paimon sink, because the batch write in Paimon sink can be submitted only once. |
thanks for your reply. |
Jdbc source can not capture cdc event. I think may be your test not all right. |
I tried to explain why I use JDBC source connector and batch mode, but perhaps I wasn't clear. |
I don't think this has anything to do with the paimon sink. In batch mode, the jdbc source only reads the data at the moment it executes the jdbc query. After the data is read, it is sent downstream. No matter the source is updated or inserted, it will not be synchronized to the downstream. |
let's focus only the insert behavior of sink and paimon.
if insert some data where the primary keys already exist:
query this table (default lastest snapshot) should like (the result of using flink/spark to insert):
when using seatunnel's sink to insert, it looks like paimon has not correctly merged all of the data:
and I try to query with time travel, every paimon's snapshot isn't correct,just like lost some data. |
Which connector does your source use? |
JDBC |
Does your source table have a primary key? |
Yes ,same primary key |
If your source table has a primary key, is inserting data with the same primary key that is actully a modification operation? |
yes |
Jdbc source can not capture the update events. I think you can use this test case to vertify. You can test only insert with same key. |
|
I see what you mean. |
@MirrerZu Thanks for your feedback, please use this pr to make a new jar of paimon connector. |
It works well for me, Thanks! |
Search before asking
What happened
When I use MySQL JDBC source and Paimon sink under batch mode, Some data with the same primary key have not been updated correctly. However I'm sure it's a PK table using deduplicate merge.
If I add
checkpoint.interval=1000
into job env config, all the data can been updated correctly, but job will throw exciption .And I didn't find any errors or warnings in HDFS logs.
To verify, I used Flink's batch mode and JDBC external tables to write same data, all the data were updated correctly without any errors.
SeaTunnel Version
2.3.5
SeaTunnel Config
Running Command
./bin/seatunnel.sh --config ./config/v2.mysql.config -e local
Error Exception
Zeta or Flink or Spark Version
Zeta 2.3.5
Java or Scala Version
jdk 1.8.0_402
Screenshots
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: