Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix data race occurred in SharedQueryBlockInputStream. #5309

Closed
wants to merge 11 commits into from

Conversation

ywqzzy
Copy link
Contributor

@ywqzzy ywqzzy commented Jul 7, 2022

What problem does this PR solve?

Issue Number: ref #5302

Problem Summary:
In restoreConcurrency(), it assigned multiple SharedQueryBlockInputStream to pipeline.
SharedQueryBlockInputStream enable multiple threads read from one stream, so the BlockStreamProfileInfo info may be accessed by multiple threads.

What is changed and how it works?

Change the read function in dbms/src/DataStreams/IProfilingBlockInputStream.h to identify streams which need to be shared with threads.
Add fine grained locks to protect the BlockStreamProfileInfo info.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jul 7, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • SeaRise

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Jul 7, 2022
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 7, 2022

/run-all-tests

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 7, 2022

/run-sanitizer-test tsan

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 7, 2022

Coverage for changed files

Filename                                           Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DataStreams/IProfilingBlockInputStream.h                19                 5    73.68%          12                 4    66.67%          30                10    66.67%           8                 1    87.50%
DataStreams/SharedQueryBlockInputStream.h               87                16    81.61%          13                 1    92.31%         156                24    84.62%          50                14    72.00%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp         239                74    69.04%          38                 4    89.47%         591               115    80.54%         150                51    66.00%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                  345                95    72.46%          63                 9    85.71%         777               149    80.82%         208                66    68.27%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18424      9625             47.76%    207371  96496        53.47%

full coverage report (for internal network access only)

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 7, 2022

@ywqzzy ywqzzy changed the title [WIP]: fix tsan in SharedQuery. [WIP]: fix race occured in SharedQuery. Jul 7, 2022
@ywqzzy ywqzzy changed the title [WIP]: fix race occured in SharedQuery. [WIP]: fix datarace occured in SharedQuery. Jul 7, 2022
@ywqzzy ywqzzy changed the title [WIP]: fix datarace occured in SharedQuery. [WIP]: fix data race occured in SharedQuery. Jul 7, 2022
@ywqzzy ywqzzy changed the title [WIP]: fix data race occured in SharedQuery. fix data race occurred in SharedQuery. Jul 7, 2022
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 7, 2022
@ywqzzy ywqzzy changed the title fix data race occurred in SharedQuery. fix data race occurred in SharedQueryBlockInputStream. Jul 7, 2022
@gengliqi gengliqi self-requested a review July 7, 2022 11:22
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 8, 2022

Currently fix is not graceful, I will find a way to make the code clear.

@ywqzzy ywqzzy changed the title fix data race occurred in SharedQueryBlockInputStream. [WIP]: fix data race occurred in SharedQueryBlockInputStream. Jul 8, 2022
@ti-chi-bot ti-chi-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 8, 2022
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 8, 2022

/rebuild

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 8, 2022

/run-all-tests

@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2022
@ywqzzy ywqzzy changed the title *: fix data race occurred in SharedQueryBlockInputStream. [WIP]*: fix data race occurred in SharedQueryBlockInputStream. Jul 11, 2022
@ti-chi-bot ti-chi-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2022
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 12, 2022

/run-sanitizer-test tsan

3 similar comments
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 12, 2022

/run-sanitizer-test tsan

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 12, 2022

/run-sanitizer-test tsan

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 12, 2022

/run-sanitizer-test tsan

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 13, 2022

/run-all-tests

@ywqzzy ywqzzy changed the title [WIP]*: fix data race occurred in SharedQueryBlockInputStream. *: fix data race occurred in SharedQueryBlockInputStream. Jul 13, 2022
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jul 13, 2022

Coverage for changed files

Filename                                           Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Common/UniqueLockGuard.h                                 2                 0   100.00%           2                 0   100.00%           4                 0   100.00%           0                 0         -
DataStreams/IProfilingBlockInputStream.cpp             158                84    46.84%          24                 4    83.33%         345               149    56.81%         124                82    33.87%
DataStreams/IProfilingBlockInputStream.h                20                 5    75.00%          13                 4    69.23%          33                10    69.70%           8                 1    87.50%
DataStreams/SharedQueryBlockInputStream.h               63                11    82.54%          13                 1    92.31%         109                16    85.32%          28                 7    75.00%
Flash/Coprocessor/DAGQueryBlockInterpreter.cpp         240                70    70.83%          38                 4    89.47%         591                99    83.25%         152                50    67.11%
Flash/Coprocessor/InterpreterUtils.cpp                  19                 3    84.21%           3                 0   100.00%          45                 6    86.67%          18                 4    77.78%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                  502               173    65.54%          93                13    86.02%        1127               280    75.16%         330               144    56.36%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18468      9605             47.99%    208043  96460        53.63%

full coverage report (for internal network access only)

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 14, 2022

/run-integration-test

@SeaRise
Copy link
Contributor

SeaRise commented Jul 14, 2022

/rebuild

@@ -400,8 +400,6 @@ void DAGQueryBlockInterpreter::executeAggregation(
pipeline.streams_with_non_joined_data.clear();
pipeline.firstStream() = std::move(stream);

// should record for agg before restore concurrency. See #3804.
recordProfileStreams(pipeline, query_block.aggregation_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be kept here.
Because even in this pr, ProfileInfo.ExecuteTime of sharedQuery is still wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can modify the logic of recordProfileStreams to fix the #5314.
Such as recordProfileStream(stream, executor_id, concurrency);.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix in #5367

Copy link
Contributor

@gengliqi gengliqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the cost of acquiring a lock in a single thread is very small due to futex, no overhead is always better than some. Most of IProfilingBlockInputStream don't need this lock so I think it's better to add a bool template parameter for IProfilingBlockInputStream to reduce the meaningless overhead.

dbms/src/Common/UniqueLockGuard.h Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 14, 2022
@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 14, 2022

/run-sanitizer-test tsan

SeaRise
SeaRise previously approved these changes Jul 14, 2022
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 14, 2022
Copy link
Contributor

@gengliqi gengliqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems difficult to make IProfilingBlockInputStream::read thread-safe...

limit_exceeded_need_break = true;
if (!checkTimeLimit())
limit_exceeded_need_break = true;
}

if (!limit_exceeded_need_break)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit_exceeded_need_break is still not in lock protection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have protected it, but IMO the code is hard to read.

Copy link
Contributor

@gengliqi gengliqi Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...I see that limit_exceeded_need_break is not an atomic variable and is read without a lock. You can see https://stackoverflow.com/questions/14624776/can-a-bool-read-write-operation-be-not-atomic-on-x86 for more information.

{
auto lock = get_lock();
info.updateExecutionTime(info.total_stopwatch.elapsed() - start_time);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progressImpl function, info still may be used without lock protection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@gengliqi
Copy link
Contributor

gengliqi commented Jul 14, 2022

Can we change the code here

pipeline.streams.assign(concurrency, shared_query_block_input_stream);
with different SharedQueryBlockInputStreams?
And these SharedQueryBlockInputStreams share the data only from SharedQueryBlockInputStream like queue, read_prefixed, read_suffixed, thread_manager. The data from IProfilingBlockInputStream don't need to be shared.

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 14, 2022

Can we change the code here

pipeline.streams.assign(concurrency, shared_query_block_input_stream);

with different SharedQueryBlockInputStreams?
And these SharedQueryBlockInputStreams share the data only from SharedQueryBlockInputStream like queue, read_prefixed, read_suffixed, thread_manager. The data from IProfilingBlockInputStream don't need to be shared.

Yes, I have discussed with searise before, but I think it will be too many code for this bug fix. Maybe it's a better way to solve it.

@SeaRise SeaRise dismissed their stale review July 14, 2022 15:52

Now I think not fixing this bug is a better choice

@ywqzzy
Copy link
Contributor Author

ywqzzy commented Jul 15, 2022

Can we change the code here

pipeline.streams.assign(concurrency, shared_query_block_input_stream);

with different SharedQueryBlockInputStreams?
And these SharedQueryBlockInputStreams share the data only from SharedQueryBlockInputStream like queue, read_prefixed, read_suffixed, thread_manager. The data from IProfilingBlockInputStream don't need to be shared.

Yes, I have discussed with searise before, but I think it will be too many code for this bug fix. Maybe it's a better way to solve it.

I will close it

@ywqzzy ywqzzy closed this Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants