-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jitify2 parsing and preprocessing overhaul #131
base: jitify2
Are you sure you want to change the base?
Conversation
- This avoids "incompatible redefinition" errors.
- Adds launch_raw() methods to replace overloads that take array of arg pointers, which were dangerously ambiguous with the variadic overload. - Adds explicit no-argument overload of launch() to avoid forming zero-sized array.
- This matches the definitions in libcucxx.
- Refactors all options handling to use a single parser implementation and new Options and OptionsVec classes. - This makes the code significantly cleaner and more robust. - Maintains backwards compatibility with string vectors at API boundaries via implicit conversions.
- Replaces C++ lexing/parsing/patching code with a proper lexer implementation, which significantly improves robustness and maintainability. - Replaces minification logic with robust token-based minification. - Replaces preprocessing logic with a new approach that uses custom parsing to find include directives. This only requires invoking NVRTC (and only its preprocessor) once per preprocess, which speeds up preprocessing by 50x in some cases. - Fixes include directory handling. Relative include paths are now handled robustly, and there is no longer any ambiguity between external and built-in headers. Note that relative paths (including in -I options) now start from the current executable directory instead of the current working directory. - These changes should be almost completely backwards compatible.
#90 refers to Jitify1. I take it this is in response to the performance regression between CUDA 12.1 and 12.2 (in a test case I've been investigating calls to I've not looked significantly into Jitify2 since I created #90, I presume it's now worth me looking more seriously into the viability of migrating our code to use Jitify2. |
@Robadob This is actually not in response to that issue; I only just became aware of it and do not know the root cause (possibly some header hierarchy changed and became pathological for Jitify). All significant new work is expected to be on Jitify2, so I would encourage you to migrate, and would be happy to answer any questions that come up. |
Fair, the timing must just be a coincidence. I assume the same re: header-hierarchy, but I haven't looked into it too deeply beyond tracking the number of calls to nvrtc compile program.
I'm now planning to explore the migration today (having discussed it with my team), will create a new issue if I have any problems. Edit: Have managed to migrate to |
jitify2.hpp
Outdated
*full_path = include.nonlocal_full_path(kJitifyCallbackHeaderPrefix); | ||
*full_path = path_simplify(*full_path); | ||
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded; | ||
if (header_callback and header_callback(include, &source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (header_callback and header_callback(include, &source)) { | |
if (header_callback && header_callback(include, &source)) { |
I've been doing our migration with this branch, this line errors in Visual Studio 2022.
case '"': return in_include_directive_ ? quote_include() : string(); | ||
case 'u': match('8'); | ||
// fall-through | ||
[[gnu::fallthrough]]; // Not sure why gcc complains here without this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#ifdef __GNUC__ | |
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#endif |
Visual Studio 2022
jitify2.hpp(6192): warning C5030: attribute 'gnu::fallthrough' is not recognized
@@ -100,16 +103,19 @@ | |||
#endif | |||
|
|||
#include <algorithm> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <algorithm> | |
#include <algorithm> | |
#include <array> |
Incomplete type jitify2.hpp:6549
(std::array<value_type, Size> data_ = {};
) under visual studio 2022.
jitify2.hpp
Outdated
Token new_tokens[kMaxNewTokens]; | ||
int j = 0; | ||
Iterator before_where = where; | ||
--before_where; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion here under Visual Studio 2022 Cannot decrement begin iterator
.
The begin iterator is coming from jitify2::parser::process_cuda_source()
s first call to insert_directive();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiddling with the code (e.g. removing this line), later had a line below assert Cannot deref end iterator
when the same insert_directive()
was being called from with .end()
.
I have however got jitify2
branch working, see an edit to my earlier comment.
- This is required for building on some systems, including manylinux2014, and it doesn't hurt on other systems.
- This makes it more robust to changes in formatting in the compiler log output between different nvrtc versions.
e1334e5
to
128e055
Compare
@Robadob Thanks for the help with this. I've pushed fixes for the issues you identified as well as some more that I found with manylinux2014. |
Visual Studio 2022 (CUDA 12.0) Log
Log
gcc (Ubuntu 10.5.0-1ubuntu1~20.04) 10.5.0 (CUDA 12.3) // get the dynamically generated header from curve rtc
headers.emplace("dynamic/curve_rtc_dynamic.h", dynamic_header);
getKnownHeaders(flamegpu_include_dir, headers); // Temp hack that prespecifies known headers
// jitify to create program (with compilation settings)
const std::string program_name = func_name + "_program"; // Does this name actually matter?
jitify2::PreprocessedProgram program = jitify2::Program(program_name, kernel_src, headers)->preprocess(options);
if (!program.ok()) { Log
ManyLinux2014 (CUDA 11.2/12.0) Now builds on CI after setting definition Visual Studio 2019 Still got the variadic template error Log
|
I think the I'm guessing the I need to find a way to reproduce the variadic template error. |
Yes, removing that line has moved Visual Studio 2022 to the same error as previously reported with gcc.
I made this change, and now it's unable to find a regular file included by #line 1 "outputdata_impl_curve_rtc_dynamic.h"
#ifndef CURVE_RTC_DYNAMIC_H_
#define CURVE_RTC_DYNAMIC_H_
#include "flamegpu/exception/FLAMEGPUDeviceException.cuh"
#include "flamegpu/detail/type_decode.h"
#include "flamegpu/runtime/detail/curve/Curve.cuh"
#include "flamegpu/util/dstring.h"
Whilst investigating this, I removed the preloading of all FLAMEGPU headers (which I presume shouldn't be required for speedup after this PR), and found it doesn't appear to be using our include directories passed via options to
It would appear I need to rewrite all of our internal includes as system includes to get any further. (This is rather undesirable, and feels like quite the breaking-change)
It's possibly a case of Visual Studio 2019 simply having poor support for If you have a windows box with Visual Studio 2022, you should be able to install the VS 2019 build tools rather than the whole of Visual Studio 2019. |
I think you need to pass The keys in the
So a key of Note that the |
I've tried this, it has no impact. It only appears able to find FLAMEGPU includes, if they are preloaded and passed via the headers map. All of our FLAMEGPU includes (and their includes) are relative to the FLAMEGPU include directory. In my local build this is
I've done this myself. jitify.hpp:7162 // Try loading from include directories.
for (const std::string& include_path : include_paths) {
*full_path = include.nonlocal_full_path(include_path);
*full_path = path_simplify(*full_path);
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded;
if (read_text_file(*full_path, &source)) {
return newly_loaded(std::move(source));
} else {
fprintf(stderr,
"Did not find #include %s at %s\n", include.name().c_str(), full_path->c_str());
}
} The first file it can't find is
Somehow the relative path is being interpreted wrong. That path should be (Note the extra So I've updated our code, so that the absolute path to the include directory is passed instead I think this is a case of relative to executable, rather than working directory. The first file that can't be found now has changed.
That file does exist on disk.
Digging into
However, if I make that change for all files loaded by TLDR: 2 issues
|
Thanks for the debugging. I've pushed fixes for preincludes, absolute filenames, and the line endings issue. The paths being relative to the executable directory is intentional, but I'm now re-thinking whether that's a good idea based on your feedback. |
I've now guarded out the offending code, and with the other changes discussed from my previous comment (not clear why jitify1/vanilla-jitify2 weren't complaining). The code I've been testing with now works under both the Windows and Linux. The Linux/CUDA12.3 RTC times are down to 961 & 1196ms from 24009 & 23543ms! (with preloaded FLAMEGPU headers disabled in both cases, which was the previous hack to accelerate compilation, previously 18793 & 17782ms with the preloading although it has negligible impact to performance with this branch) I don't have an immediate RTC model to test GLM with (the original thing that spurred #90), but I expect it will fare similarly. The main difference with GLM is that it's many headers all use relative paths to include each-other, which prevented it from benefiting from the header preload hack. I look forward to seeing this merged, thanks for all the help getting it working. The VS2022 warnings (in #132) are still upsetting CI, and the VS2019 std::function/varadic-template issue persists. But we can probably guard the VS2022 warnings if necessary (VS via CMake doesn't seem to obey ISystem), and we may drop support for VS2019 if necessary (that requires some internal discussion though). |
Realised we have something in our test suite, that reported 1349 & 1318ms for the two (different) kernels under CUDA 12.3. Significantly better than the original >60s reported in #90. To get any further I expected we'd need to be flattening the headers, so this is great. |
cc: @kmaehashi for vis (I need to find time to migrate to jitify2 so as to test this branch) |
410c320
to
0d42248
Compare
Fixes #90 and #107.
(This PR requires #128 and #129 to be merged first).