-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code object compression via bundling #1374
Code object compression via bundling #1374
Conversation
275cfda
to
471de13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basics look good, assume all rocblas tests passed.
1458f44
to
d8b5337
Compare
Might be worth zstd compressing the msgpack files too, they're pretty compressible. Here's an untested attempt at decompress support in case it's helpful: diff --git a/Tensile/Source/lib/source/msgpack/MessagePack.cpp b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
index de97929c..dbc397e0 100644
--- a/Tensile/Source/lib/source/msgpack/MessagePack.cpp
+++ b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
@@ -28,6 +28,8 @@
#include <Tensile/msgpack/Loading.hpp>
+#include <zstd.h>
+
#include <fstream>
namespace Tensile
@@ -86,6 +88,34 @@ namespace Tensile
return nullptr;
}
+ // Check if the file is zstd compressed
+ char magic[4];
+ in.read(magic, 4);
+ bool isCompressed = (in.gcount() == 4 && magic[0] == '\x28' && magic[1] == '\xB5' && magic[2] == '\x2F' && magic[3] == '\xFD');
+ // Reset file pointer to the beginning
+ in.seekg(0, std::ios::beg);
+
+ if (isCompressed) {
+ // Decompress zstd file
+ std::vector<char> compressedData((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
+
+ size_t decompressedSize = ZSTD_getFrameContentSize(compressedData.data(), compressedData.size());
+ if (decompressedSize == ZSTD_CONTENTSIZE_ERROR || decompressedSize == ZSTD_CONTENTSIZE_UNKNOWN) {
+ if(Debug::Instance().printDataInit())
+ std::cout << "Error: Unable to determine decompressed size for " << filename << std::endl;
+ return nullptr;
+ }
+
+ std::vector<char> decompressedData(decompressedSize);
+ size_t dSize = ZSTD_decompress(decompressedData.data(), decompressedSize, compressedData.data(), compressedData.size());
+ if (ZSTD_isError(dSize)) {
+ if(Debug::Instance().printDataInit())
+ std::cout << "Error: ZSTD decompression failed for " << filename << std::endl;
+ return nullptr;
+ }
+
+ msgpack::unpack(result, decompressedData.data(), dSize);
+ } else {
msgpack::unpacker unp;
bool finished_parsing;
constexpr size_t buffer_size = 1 << 19;
@@ -109,6 +139,7 @@ namespace Tensile
return nullptr;
}
+ }
}
catch(std::runtime_error const& exc)
{ |
@LunNova Thanks for the code snippet. I've had this idea as well and have plans to implement it. However, for the scope of this PR we'll keep it to code object files and add the .dat file compression in another PR. |
@KKyang This change will increase build times by default in hipBLASLt. One option is to add a --release flag to the install script and TensileCreateLibrary, which will trigger compressed builds, which will take longer. What is your opinion on the default behaviour, debug (fast, large binaries) or release (slow, small binaries)? |
@@ -644,8 +373,8 @@ def success(kernel): | |||
kernelHeaderFile.close() | |||
|
|||
if not globalParameters["GenerateSourcesAndExit"]: | |||
codeObjectFiles += buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath) | |||
codeObjectFiles += getAssemblyCodeObjectFiles(kernelsToBuild, kernelWriterAssembly, outputPath) | |||
codeObjectFiles += SourceCommands.buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good start to cleaning up the code used to generate binaries. In the future, I would like to consider something like:
codeObjectFiles += SourceCommands.buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath) | |
objectFiles += toolchain.compile(kernelFiles, outputPath) | |
codeObjectFiles += toolchain.link(objectFiles, outputPath) |
@@ -1621,6 +1622,40 @@ def which(p): | |||
return candidate | |||
return None | |||
|
|||
def splitArchs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider a new module for dealing with these types of operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please must get CQE's approval before merging. Thanks.
Summary:
This PR adds a compression layer to all final code objects, thereby generating smaller libraries at the expense of build time. Includes minor refactoring.
Outcomes:
getAssemblyCodeObjectFiles
has been renamed tobuildAssemblyCodeObjectFiles
to match the name of source kernel functions.--no-compress
can be used on the install script to disable compression.Build time increase, gfx90a: 7.0%
Build time increase for hipBLASLt, gfx942: 4.0%
Compression ratio gfx90a: 3.02
Compression ratio for hipBLASLt gfx942 libraries: 8.99
Testing and Environment:
Docker: Ubuntu 24.04, ROCm 6.4 RC stack, AMD clang version 18.0.0, AMD clang-offload-bundler version 18.0.0
Tested with hipBLASLt test client