-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PS-7806: Column compression breaks async replication on PS #4460
Conversation
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Removed compress_heap free statements, hence memory is reused.
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was not emptied inside method row_sel_store_mysql_rec ,instead it was getting freed. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Removed compress_heap free statements, and added code to empty compress_heap heap.
# This is a wrapper for binlog_delete_with_compress_column.test so that the same | ||
# test case can be used for both statement and row based logging | ||
|
||
-- source include/have_binlog_format_mixed_or_statement.inc | ||
-- source extra/binlog_tests/binlog_delete_with_compress_column.test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is no any need to run the test in statement format, as corrupted data is not propagated in statement based replication. So, I suggest you to please remove this test.
# | ||
# PS-7806: Column compression breaks async replication on PS | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests pass even without fix, so please add a replication test (rpl suite) as this bug was initially found in a replication scenario.
------------------------------------------------------------------------------
[ 14%] binlog.binlog_row_delete_with_compress_column 'mix' [ skipped ] Doesn't support --binlog-format = 'mixed'
[ 28%] binlog.binlog_stm_delete_with_compress_column 'mix' [ pass ] 4712
[ 42%] binlog.binlog_row_delete_with_compress_column 'row' [ pass ] 5768
[ 57%] binlog.binlog_stm_delete_with_compress_column 'row' [ skipped ] Doesn't support --binlog-format = 'row'
[ 71%] binlog.binlog_row_delete_with_compress_column 'stmt' [ skipped ] Doesn't support --binlog-format = 'statement'
[ 85%] binlog.binlog_stm_delete_with_compress_column 'stmt' [ pass ] 4456
[100%] shutdown_report [ pass ]
if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) | ||
row_mysql_prebuilt_free_compress_heap(prebuilt); | ||
if (prebuilt->compress_heap != nullptr) { | ||
mem_heap_empty(prebuilt->compress_heap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are reusing, even emptying is a problem isn't it? I think partitions are supposed to their partition-specific blob heaps? Can you please check again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry used the wrong button.. "not approved yet" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitendra-bhosle can you please reply to these questions?
I don't understand the issue. Specifically why free is a problem. and empty is not?
AFAIU, empty doesn't the remove the first block. So may be the compressed heap is staying within the first block of heap.
Can you please create a blob text such that it occupies more than the first block in heap?
This is outdated changes.
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Removed compress_heap free statements, hence memory is reused.
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Modified test case to use repl_diff.
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Modified test case as per review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine with the test. But please wait for final approval from @satya-bodapati
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the question on thread
@satya-bodapati : #4460 (comment) |
Well, I still the empty function. If you remove this empty, Do we call empty at the end of statement execution? |
This empty statement , I have added just to be consistent with blob_heap .
We are allocating memory column_get_compress_header and row_compress_column I reffered reuse , because this is diverging from blob_heap memory management in a sense that memory is getting |
Any chance of this making into 8.0.26 release? I just got bitten by this |
I think this Issue will be part of 8.0.27 release |
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Reverted Changes , so that Can implement different memory allocation scheme
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Implemented individual partition compress_heap, inline with implementation of m_blob_heap_parts. Earlier we had common compress heap across partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please test PS-7657 by reverting the fix and using this fix
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix Implemented individual partition compress_heap, inline with implementation of m_blob_heap_parts. Earlier we had common compress heap across partitions.
Tested , All tests are passing |
Thats great news! Please include the revert of PS-7657 code change in this PR |
Revert is conflicting with this change : Earlier we had |
You can add a new commit which will bring to original state. |
Also this change of PS-7657 is still required - Inside function row_insert_for_mysql_using_ins_graph |
Yes @nitendra-bhosle, please bring that emptying back. with the new logic, emptying or freeing compress_heap shouldn't matter. |
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix in this commit Reverted changes made for PS-7657
@nitendra-bhosle I still don't see the emptying code in row_insert_for_mysql_using_ins_graph()
This was present before PS-7657 and it should be brought back. It will not have any effect with your latest fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my latest comment
After my investigation today, I found that above change is required to fix - PS-7657.It will not have any effect on my fix for PS-7806. With current patch , which is part of these commits , all tests for PS--7657 and PS-7806 are passing |
https://jira.percona.com/browse/PS-7806 Analysis compress_heap memory was released inside method row_sel_store_mysql_rec , even though ,it was reused. Which caused incorrect bin logging for comressed column, leading to async replication to break. Fix in this commit Reverted changes made for PS-7657
if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) | ||
mem_heap_empty(prebuilt->compress_heap); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously before 7657, the empty was before trx->op_info = "inserting" line. Any reason, it is placed here? Does it have to be done after conversion of row to innodb format?
Can it be moved to before trx->op_info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you replied below.
So this means, we cannot free a buffer from the previous handler call and it is still in use?
Ideally, we should be able to free before the mysql_to_innodb row format conversion. The buffer would have last row insertion/handler call. So technically we should be allowed to free it.
I think I asked this question (may be on slack), can you please add empty or freeing of normal heap before trx->op_info/row_mysql_convert_row_to_innobase()? Will it have the same problem if we free earlier?
We can't move it before trx->op_info , as row_mysql_convert_row_to_innobase call uses prebuilt->compress_heap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my last comment
https://jira.percona.com/browse/PS-7806
Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.
Fix
Removed compress_heap free statements,
hence memory is reused.