Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-7806: Column compression breaks async replication on PS #4460

Closed
wants to merge 10 commits into from

Conversation

nitendra-bhosle
Copy link
Contributor

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Removed compress_heap free statements,
hence memory is reused.

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Removed compress_heap free statements,
hence memory is reused.
@nitendra-bhosle
Copy link
Contributor Author

nitendra-bhosle commented Sep 10, 2021

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was not emptied inside
method row_sel_store_mysql_rec ,instead it was getting freed.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Removed compress_heap free statements,
and added code to empty compress_heap heap.
Comment on lines 1 to 5
# This is a wrapper for binlog_delete_with_compress_column.test so that the same
# test case can be used for both statement and row based logging

-- source include/have_binlog_format_mixed_or_statement.inc
-- source extra/binlog_tests/binlog_delete_with_compress_column.test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is no any need to run the test in statement format, as corrupted data is not propagated in statement based replication. So, I suggest you to please remove this test.

Comment on lines 1 to 3
#
# PS-7806: Column compression breaks async replication on PS
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass even without fix, so please add a replication test (rpl suite) as this bug was initially found in a replication scenario.

------------------------------------------------------------------------------
[ 14%] binlog.binlog_row_delete_with_compress_column 'mix'  [ skipped ]  Doesn't support --binlog-format = 'mixed'
[ 28%] binlog.binlog_stm_delete_with_compress_column 'mix'  [ pass ]   4712
[ 42%] binlog.binlog_row_delete_with_compress_column 'row'  [ pass ]   5768
[ 57%] binlog.binlog_stm_delete_with_compress_column 'row'  [ skipped ]  Doesn't support --binlog-format = 'row'
[ 71%] binlog.binlog_row_delete_with_compress_column 'stmt'  [ skipped ]  Doesn't support --binlog-format = 'statement'
[ 85%] binlog.binlog_stm_delete_with_compress_column 'stmt'  [ pass ]   4456
[100%] shutdown_report                           [ pass ]

if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
row_mysql_prebuilt_free_compress_heap(prebuilt);
if (prebuilt->compress_heap != nullptr) {
mem_heap_empty(prebuilt->compress_heap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are reusing, even emptying is a problem isn't it? I think partitions are supposed to their partition-specific blob heaps? Can you please check again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry used the wrong button.. "not approved yet" :)

Copy link
Contributor

@satya-bodapati satya-bodapati Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nitendra-bhosle can you please reply to these questions?

I don't understand the issue. Specifically why free is a problem. and empty is not?
AFAIU, empty doesn't the remove the first block. So may be the compressed heap is staying within the first block of heap.

Can you please create a blob text such that it occupies more than the first block in heap?

This is outdated changes.

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Removed compress_heap free statements,
hence memory is reused.
https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Modified test case to use repl_diff.
https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Modified test case as per review comments.
Copy link
Contributor

@venkatesh-prasad-v venkatesh-prasad-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine with the test. But please wait for final approval from @satya-bodapati

Copy link
Contributor

@satya-bodapati satya-bodapati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the question on thread

@nitendra-bhosle
Copy link
Contributor Author

@satya-bodapati : #4460 (comment)
This is outdated. I have removed both empty and free heap statements.

@satya-bodapati
Copy link
Contributor

@satya-bodapati : #4460 (comment)
This is outdated. I have removed both empty and free heap statements.

Well, I still the empty function.

If you remove this empty, Do we call empty at the end of statement execution?
Can you explain how the 'reuse' is happening?

@nitendra-bhosle
Copy link
Contributor Author

@satya-bodapati : #4460 (comment)
This is outdated. I have removed both empty and free heap statements.

Well, I still the empty function.

If you remove this empty, Do we call empty at the end of statement execution?

This empty statement , I have added just to be consistent with blob_heap .
If we remove this, we will not call empty at the end of statement execution, as it's already being
getting free's inside : row_prebuilt_free

Can you explain how the 'reuse' is happening?

We are allocating memory column_get_compress_header and row_compress_column
using statements like :
if (!prebuilt->compress_heap)
prebuilt->compress_heap = mem_heap_create(ut_max(UNIV_PAGE_SIZE, buf_len));

I reffered reuse , because this is diverging from blob_heap memory management in a sense that memory is getting
allowcated inside these two methods , and reuse might be happening , otherwise , no need to allocated and check whether it's already allocated inside other method.

@bratao
Copy link

bratao commented Sep 24, 2021

Any chance of this making into 8.0.26 release? I just got bitten by this

@nitendra-bhosle
Copy link
Contributor Author

Any chance of this making into 8.0.26 release? I just got bitten by this

I think this Issue will be part of 8.0.27 release

Nitendra Bhosle added 2 commits September 30, 2021 19:01
https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Reverted Changes , so that Can implement
different memory allocation scheme
https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Implemented individual partition compress_heap,
inline with implementation of m_blob_heap_parts.
Earlier we had common compress heap across partitions.
Copy link
Contributor

@satya-bodapati satya-bodapati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please test PS-7657 by reverting the fix and using this fix

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix

Implemented individual partition compress_heap,
inline with implementation of m_blob_heap_parts.
Earlier we had common compress heap across partitions.
@nitendra-bhosle
Copy link
Contributor Author

Also please test PS-7657 by reverting the fix and using this fix

Tested , All tests are passing

@satya-bodapati
Copy link
Contributor

Also please test PS-7657 by reverting the fix and using this fix

Tested , All tests are passing

Thats great news! Please include the revert of PS-7657 code change in this PR

@nitendra-bhosle
Copy link
Contributor Author

Also please test PS-7657 by reverting the fix and using this fix

Tested , All tests are passing

Thats great news! Please include the revert of PS-7657 code change in this PR

Revert is conflicting with this change :
if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
row_mysql_prebuilt_free_compress_heap(prebuilt);

Earlier we had
if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
mem_heap_empty(prebuilt->compress_heap);

@satya-bodapati
Copy link
Contributor

Also please test PS-7657 by reverting the fix and using this fix

Tested , All tests are passing

Thats great news! Please include the revert of PS-7657 code change in this PR

Revert is conflicting with this change : if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) row_mysql_prebuilt_free_compress_heap(prebuilt);

Earlier we had if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) mem_heap_empty(prebuilt->compress_heap);

You can add a new commit which will bring to original state.

@nitendra-bhosle
Copy link
Contributor Author

Also this change of PS-7657 is still required -
if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
mem_heap_empty(prebuilt->compress_heap);

Inside function row_insert_for_mysql_using_ins_graph

@satya-bodapati
Copy link
Contributor

satya-bodapati commented Oct 8, 2021

Also this change of PS-7657 is still required - if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) mem_heap_empty(prebuilt->compress_heap);

Inside function row_insert_for_mysql_using_ins_graph

Yes @nitendra-bhosle, please bring that emptying back. with the new logic, emptying or freeing compress_heap shouldn't matter.

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix in this commit

Reverted changes made for PS-7657
@satya-bodapati
Copy link
Contributor

Also this change of PS-7657 is still required - if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) mem_heap_empty(prebuilt->compress_heap);
Inside function row_insert_for_mysql_using_ins_graph

Yes @nitendra-bhosle, please bring that emptying back. with the new logic, emptying or freeing compress_heap shouldn't matter.

@nitendra-bhosle I still don't see the emptying code in row_insert_for_mysql_using_ins_graph()

  if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
    mem_heap_empty(prebuilt->compress_heap);

This was present before PS-7657 and it should be brought back. It will not have any effect with your latest fix.

Copy link
Contributor

@satya-bodapati satya-bodapati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my latest comment

@nitendra-bhosle
Copy link
Contributor Author

Also this change of PS-7657 is still required - if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) mem_heap_empty(prebuilt->compress_heap);
Inside function row_insert_for_mysql_using_ins_graph

Yes @nitendra-bhosle, please bring that emptying back. with the new logic, emptying or freeing compress_heap shouldn't matter.

Also this change of PS-7657 is still required - if (UNIV_LIKELY_NULL(prebuilt->compress_heap)) mem_heap_empty(prebuilt->compress_heap);
Inside function row_insert_for_mysql_using_ins_graph

Yes @nitendra-bhosle, please bring that emptying back. with the new logic, emptying or freeing compress_heap shouldn't matter.

@nitendra-bhosle I still don't see the emptying code in row_insert_for_mysql_using_ins_graph()

  if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
    mem_heap_empty(prebuilt->compress_heap);

This was present before PS-7657 and it should be brought back. It will not have any effect with your latest fix.

After my investigation today, I found that above change is required to fix - PS-7657.It will not have any effect on my fix for PS-7806.
But this will break for PS-7657 , hence I kept it as is.

With current patch , which is part of these commits , all tests for PS--7657 and PS-7806 are passing

https://jira.percona.com/browse/PS-7806

Analysis
compress_heap memory was released inside
method row_sel_store_mysql_rec , even though ,it was reused.
Which caused incorrect bin logging for comressed column,
leading to async replication to break.

Fix in this commit

Reverted changes made for PS-7657
Comment on lines +2033 to +2035
if (UNIV_LIKELY_NULL(prebuilt->compress_heap))
mem_heap_empty(prebuilt->compress_heap);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously before 7657, the empty was before trx->op_info = "inserting" line. Any reason, it is placed here? Does it have to be done after conversion of row to innodb format?

Can it be moved to before trx->op_info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you replied below.
So this means, we cannot free a buffer from the previous handler call and it is still in use?

Ideally, we should be able to free before the mysql_to_innodb row format conversion. The buffer would have last row insertion/handler call. So technically we should be allowed to free it.

I think I asked this question (may be on slack), can you please add empty or freeing of normal heap before trx->op_info/row_mysql_convert_row_to_innobase()? Will it have the same problem if we free earlier?

@nitendra-bhosle
Copy link
Contributor Author

previously before 7657, the empty was before trx->op_info = "inserting" line. Any reason, it is placed here? Does it have to be done after conversion of row to innodb format?

Can it be moved to before trx->op_info?

We can't move it before trx->op_info , as row_mysql_convert_row_to_innobase call uses prebuilt->compress_heap

Copy link
Contributor

@satya-bodapati satya-bodapati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my last comment

@kamil-holubicki
Copy link
Contributor

kamil-holubicki commented Nov 17, 2023

Will be handled by
#5161
#5162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants