Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code object compression via bundling #1374

Merged

Conversation

bstefanuk
Copy link
Contributor

@bstefanuk bstefanuk commented Nov 22, 2024

Summary:

This PR adds a compression layer to all final code objects, thereby generating smaller libraries at the expense of build time. Includes minor refactoring.

Outcomes:

  • A new clang-offload-bundler invocation is added after assembly object linking.
  • getAssemblyCodeObjectFiles has been renamed to buildAssemblyCodeObjectFiles to match the name of source kernel functions.
  • The flag --no-compress can be used on the install script to disable compression.
Build Time Build Size (build/library/ directory)
TCL, Feature, gfx90a 8m38.902s (8.65s) 269M
TCL, Develop, gfx90a 8m3.921s (8.07s) 812M
hipBLASLt, Feature, gfx942 59m9.664s (59.16s) 1.2G
hipBLASLt, Develop, gfx942 56m52.821s (56.88s) 11G

Build time increase, gfx90a: 7.0%
Build time increase for hipBLASLt, gfx942: 4.0%

Compression ratio gfx90a: 3.02
Compression ratio for hipBLASLt gfx942 libraries: 8.99

Testing and Environment:

Docker: Ubuntu 24.04, ROCm 6.4 RC stack, AMD clang version 18.0.0, AMD clang-offload-bundler version 18.0.0

Tested with hipBLASLt test client

[==========] 14429 tests from 13 test suites ran. (1977818 ms total)
[  PASSED  ] 14429 tests.

@bstefanuk bstefanuk self-assigned this Nov 22, 2024
@bstefanuk bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 275cfda to 471de13 Compare November 22, 2024 00:27
TorreZuk
TorreZuk previously approved these changes Nov 22, 2024
Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basics look good, assume all rocblas tests passed.

@bstefanuk bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 1458f44 to d8b5337 Compare November 27, 2024 01:16
@LunNova
Copy link

LunNova commented Nov 27, 2024

Might be worth zstd compressing the msgpack files too, they're pretty compressible. Here's an untested attempt at decompress support in case it's helpful:

diff --git a/Tensile/Source/lib/source/msgpack/MessagePack.cpp b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
index de97929c..dbc397e0 100644
--- a/Tensile/Source/lib/source/msgpack/MessagePack.cpp
+++ b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
@@ -28,6 +28,8 @@
 
 #include <Tensile/msgpack/Loading.hpp>
 
+#include <zstd.h>
+
 #include <fstream>
 
 namespace Tensile
@@ -86,6 +88,34 @@ namespace Tensile
                 return nullptr;
             }
 
+            // Check if the file is zstd compressed
+            char magic[4];
+            in.read(magic, 4);
+            bool isCompressed = (in.gcount() == 4 && magic[0] == '\x28' && magic[1] == '\xB5' && magic[2] == '\x2F' && magic[3] == '\xFD');
+            // Reset file pointer to the beginning
+            in.seekg(0, std::ios::beg);
+
+            if (isCompressed) {
+                // Decompress zstd file
+                std::vector<char> compressedData((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
+
+                size_t decompressedSize = ZSTD_getFrameContentSize(compressedData.data(), compressedData.size());
+                if (decompressedSize == ZSTD_CONTENTSIZE_ERROR || decompressedSize == ZSTD_CONTENTSIZE_UNKNOWN) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: Unable to determine decompressed size for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                std::vector<char> decompressedData(decompressedSize);
+                size_t dSize = ZSTD_decompress(decompressedData.data(), decompressedSize, compressedData.data(), compressedData.size());
+                if (ZSTD_isError(dSize)) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: ZSTD decompression failed for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                msgpack::unpack(result, decompressedData.data(), dSize);
+            } else {
             msgpack::unpacker unp;
             bool              finished_parsing;
             constexpr size_t  buffer_size = 1 << 19;
@@ -109,6 +139,7 @@ namespace Tensile
 
                 return nullptr;
             }
+            }
         }
         catch(std::runtime_error const& exc)
         {

@bstefanuk
Copy link
Contributor Author

@LunNova Thanks for the code snippet. I've had this idea as well and have plans to implement it. However, for the scope of this PR we'll keep it to code object files and add the .dat file compression in another PR.

KKyang
KKyang previously approved these changes Dec 4, 2024
@bstefanuk
Copy link
Contributor Author

bstefanuk commented Dec 4, 2024

@KKyang This change will increase build times by default in hipBLASLt. One option is to add a --release flag to the install script and TensileCreateLibrary, which will trigger compressed builds, which will take longer.

What is your opinion on the default behaviour, debug (fast, large binaries) or release (slow, small binaries)?

@@ -644,8 +373,8 @@ def success(kernel):
kernelHeaderFile.close()

if not globalParameters["GenerateSourcesAndExit"]:
codeObjectFiles += buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath)
codeObjectFiles += getAssemblyCodeObjectFiles(kernelsToBuild, kernelWriterAssembly, outputPath)
codeObjectFiles += SourceCommands.buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good start to cleaning up the code used to generate binaries. In the future, I would like to consider something like:

Suggested change
codeObjectFiles += SourceCommands.buildSourceCodeObjectFiles(CxxCompiler, kernelFiles, outputPath)
objectFiles += toolchain.compile(kernelFiles, outputPath)
codeObjectFiles += toolchain.link(objectFiles, outputPath)

@@ -1621,6 +1622,40 @@ def which(p):
return candidate
return None

def splitArchs():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider a new module for dealing with these types of operations.

Copy link
Collaborator

@jichangjichang jichangjichang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please must get CQE's approval before merging. Thanks.

@bstefanuk bstefanuk merged commit c1f9582 into ROCm:develop Dec 6, 2024
13 checks passed
@bstefanuk bstefanuk deleted the bundle-compress-co-files-tensilelite branch December 10, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants