-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression: optimize the CompressionCodecFactory #9299
Compression: optimize the CompressionCodecFactory #9299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -69,12 +69,13 @@ UInt32 CompressionCodecLightweight::doCompressData(const char * source, UInt32 s | |||
case CompressionDataType::Float32: | |||
case CompressionDataType::Float64: | |||
case CompressionDataType::String: | |||
case CompressionDataType::Unknown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to do compression on "unknown" data type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I add "unknown" to distinguish between string (char/varchar) and other types (binary).
if (auto codec = create(setting); codec) | ||
codecs.push_back(std::move(codec)); | ||
} | ||
return codecs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check that we will return at least one valid codec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiflash/dbms/src/IO/Compression/CompressionCodecFactory.cpp
Lines 188 to 192 in 0e46c2e
RUNTIME_CHECK(codec); | |
#ifndef DBMS_PUBLIC_GTEST | |
RUNTIME_CHECK(codec->isCompression()); | |
#endif | |
return codec; |
check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, CompressionCodecFactory::createCodecs can return a vector<Codec>
with no element and CompressionCodecMultiple
accept it.
tiflash/dbms/src/IO/Compression/CompressionCodecMultiple.cpp
Lines 32 to 34 in 80deeae
CompressionCodecMultiple::CompressionCodecMultiple(Codecs && codecs_) | |
: codecs(std::move(codecs_)) | |
{} |
And it can pass the check you added here because it is a non-nullptr
CompressionCodecMultiple
instance.tiflash/dbms/src/IO/Compression/CompressionCodecFactory.cpp
Lines 182 to 193 in 0e46c2e
CompressionCodecPtr CompressionCodecFactory::create(const CompressionSettings & settings) | |
{ | |
RUNTIME_CHECK(!settings.settings.empty()); | |
CompressionCodecPtr codec = (settings.settings.size() > 1) | |
? std::make_unique<CompressionCodecMultiple>(createCodecs(settings)) | |
: create(settings.settings.front()); | |
RUNTIME_CHECK(codec); | |
#ifndef DBMS_PUBLIC_GTEST | |
RUNTIME_CHECK(codec->isCompression()); | |
#endif | |
return codec; | |
} |
I think we need to add an assert(!codecs.empty())
in CompressionCodecFactory::createCodecs or the ctor of CompressionCodecMultiple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, a negative if statement is not as readable as a positive one
0e46c2e
to
58e4141
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[LGTM Timeline notifier]Timeline:
|
return it->second; | ||
if (lz4_map.size() >= MAX_LZ4_MAP_SIZE) | ||
lz4_map.clear(); | ||
lz4_map.emplace(setting.level, std::make_shared<CompressionCodecLZ4>(setting.level)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if lz4_map
is updated concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/hold |
e61baa2
to
1dbcb16
Compare
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
1dbcb16
to
e61fc41
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CalvinNeo, JaySon-Huang, JinheLin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
What problem does this PR solve?
Issue Number: close #8982
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note