-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Wrap complex global objects to control lifetime #2516
[SYCL] Wrap complex global objects to control lifetime #2516
Conversation
Can we add some test which will check that no new "global" objects are added by a patch? ` # nm -a ../deploy/linux_prod/lib/libsycl.so | grep " b " | grep detail 00000000003957a8 b _ZGVZN2cl4sycl6detail10SYCLConfigILNS1_8ConfigIDE1EE3getEvE6ValStr |
I'm not sure how to implement such a check. Not all global objects cause problems, only those, that are not trivial. Having |
I filtered only objects in detail namespace. We can check for object types to be sycl classes (to ignore trivial objects). |
|
||
SpinLock MFieldsLock; | ||
|
||
std::unique_ptr<Scheduler> MScheduler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid truly knowing/enumerating all the objects handled in this wrapper, and just make it be a custom heap where arbitrary objects can be dealt with overloaded new/delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smaslov-intel I was thinking about it. If we go with some custom memory space, we'd need some mechanism to call destructors on that memory. We'd either need to somehow store custom deleters and call them upon shutdown, or implement some other logic, where object is responsible for its destruction. Both look quite complicated solutions. Also, it's not clear how to access these objects. Use names and store pointers in a map? So, I decided to keep global wrapper with unique pointers for now. Adding a new global object shouldn't be that hard, and I don't expect us to add a new global object every other Tuesday as they're code smell anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new global object shouldn't be that hard
But it is badly violating OOP encapsulation principle, I feel.
Also it makes it nearly impossible to use this global handler in the plugins (standalone parts of SYCL RT).
And then anything else we use for plugins would compete with 65535 destruction priority used for this data.
Having said that I admit this is definitely an improvement and we should go with it until something better is developed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good intent. But I think this deserves a design doc section - can you please create one?
There is also https://en.cppreference.com/w/cpp/utility/program/atexit and https://en.cppreference.com/w/cpp/utility/program/at_quick_exit that might be useful. |
Unfortunately, this does not guarantee, that our destructors will be run after user code. Consider the following (completely artificial) toy example: // somewhere in coolsycllibrary.so:
void *createSyclDevice() { ... }
void freeSyclDevice(void*) { ... }
// User code:
struct Wrapper {
void *device;
~Wrapper() {
if (device) {
freeSyclDevice(device);
}
}
};
Wrapper globalSyclDevice;
// before main is run, __cxa_atexit(~Wrapper()) is called.
int main() {
void *handle = dlopen("coolsycllibrary.so"); // here libsycl.so global objects are initialized.
// resolve symbols
// Work with SYCL device.
return 0;
}
// Destructors for all SYCL global objects are called.
// ~Wrapper() is called --> Oops. Now, I can't imagine someone really does it in pure C++ (why, if you have an official C++ API), but I can imagine such solutions used for interoperability with other languages (like GPU-accelerated libraries for Python).
That's a mind-blowing piece of C++ sorcery, I absolutely like it. But for the reasons I mentioned above we don't want to skip destruction of global objects. Such a library can be dlopened/dlclosed many times, resulting in significant memory leak. |
Yes, this was just another suggestion for the solution landscape. Otherwise, in a similar area, there is inside the SYCL committee this discussion: https://gitlab.khronos.org/sycl/Specification/-/issues/289 |
…_wrapper * upstream/sycl: (1533 commits) [SYCL] XFAIL sub_group shuffle tests on GPU [SYCL] Add support for L0 loader validation layer (intel#2520) [NFC][LIT] Temporary disable function pointers as they hang on L0 (intel#2544) [SYCL] use release version of OpenCL ICD loader [SYCL] Improve testing of host-task (intel#2540) Revert 1291215 [SYCL] Fix warning caused by [[nodiscard]] attribute (intel#2545) [SYCL] Workaround windows build failure [SYCL] Remove kernel_signature_start from int header (intel#2537) [SYCL] Fix ABI tests in post-commit (intel#2539) [SYCL][DOC] Update C-CXX-StandardLibrary doc to align with latest status (intel#2529) [SYCL][NFC] Fix static code analysis concerns (intel#2531) [SYCL][NFC] Improve testing for accessor_property_list (intel#2532) [SYCL] Avoid overuse of CPU on wait read-write lock loop (intel#2525) [SYCL] Implement no-decomposition for kernel types that don't need it. (intel#2477) [SYCL] Add group algorithm constraints (intel#2462) [BuildBot] Uplift Windows GPU RT from 8673 to 8778 (intel#2533) [SYCL][LIT][NFC] Extend ABI test suite (intel#2522) [SYCL][DebugInfo] Reinstate source locations for some kernel instructions (intel#2527) [SYCL][NFC] Replace the deprecated VectorType::getNumElements() (intel#2524) ...
At least some local comments are required. |
Note that the plugins may also have global objects that need to be deleted. An example of this is #2566. Maybe we need a new PI interface that |
We've discussed it with @romanovvlad and decided, that it should be handled in a separate PR. |
…_wrapper * upstream/sycl: (1021 commits) [SYCL] Enable async_work_group_copy for scalar and vector bool types (intel#2582) [SYCL] Fix element type in handler::copy (intel#2590) [NFC][SYCL] Remove unnecessary if condition (intel#2585) [SYCL][NFC] Fix SYCL lit test execution on a system w/o GPU (intel#2584) [SYCL] Add error handling for non-uniform work group size case (intel#2569) [SYCL][ESIMD] Preserve undef initializer for globals in ESIMDLowerVecArg pass (intel#2555) [SYCL] Make Level-Zero events visible on the host (intel#2576) [Driver][SYCL][NFC] Add help information for -Wno-sycl-strict (intel#2570) [SYCL] Relax test to work in Win32 environment. (intel#2580) [SYCL] Emit suppressed warnings from SYCL headers (intel#2575) [SYCL][NFC] Cover more classes with ABI tests (intel#2577) [SYCL][ESIMD] Update ESIMD tests and add raw send support. (intel#2482) [SYCL] Make ESIMD on-device tests require linux,gpu,opencl. (intel#2560) [SYCL] Release commands with no dependencies after they're enqueued (intel#2492) [SYCL] Add multi-device and multi-platform support for SYCL_DEVICE_ALLOWLIST (intel#2483) [SYCL] Try to enqueue host command depencies (intel#2561) [SYCL][ESIMD][NFC] Align namespace name with the spec guidelines (intel#2573) [SYCL][NFC] Add class layout ABI tests for memory objects (intel#2559) [SYCL] Change adress space for global variables (intel#2534) [NFC][SYCL] Fix comment. (intel#2541) ...
* sycl: [SYCL] Fix test failure due to size discrepancy in 32 and 64 bit environment (intel#2594) [BuildBot] Linux GPU driver uplift to 20.39.17972 (intel#2600) [SYCL][NFC] Remove cyclic dependency in headers (intel#2601) [SYCL][Doc] Fix Sphinx docs index (intel#2599) [SYCL][PI][L0] Add an assert to piEnqueueKernelLaunch() when the GlobalWorkOffset!=0 (intel#2593) [SYCL][Driver] Turn on -spirv-allow-extra-diexpressions option (intel#2578) [SYCL] Serialize queue related PI API in the Level-Zero plugin (intel#2588) Added missing math APIs for devicelib. (intel#2558) [SYCL][DOC] Fix path to FPGA device selector (intel#2563) [SYCL][CUDA] Add basic sub-group functionality (intel#2587) [SYCL] Add specialization constant feature design doc. (intel#2572) [SYCL] Expand release lit test to CPU and ACC (intel#2589) [SYCL] Add clang support for FPGA kernel attribute scheduler_target_fmax_mhz (intel#2511) [SYCL][ESIMD] Fixed compiler crash in LowerESIMDVecArg pass (intel#2556)
@kbobrovs @smaslov-intel ping |
// time when "Plugins" could be deleted. | ||
Plugins = new vector_class<plugin>; | ||
initializePlugins(Plugins); | ||
initializePlugins(&GlobalHandler::instance().getPlugins()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why repeat calls here instead of remembering the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the concise description in global_handler.hpp
@vladimirlaz Could you please approve as well? |
* sycl: (566 commits) [SYCL] Fix explicit copy operation for host device (intel#2627) [SYCL] Fix initialization issue on Windows (intel#2632) [SYCL][CUDA] Disable image_write test on CUDA devices (intel#2630) [SYCL] Removes any knowledge of specific memory advices from PI API. (intel#2607) [BuildBot] Uplift GPU RT version for Linux to 20.40.18075 (intel#2626) [SYCL] Wrap complex global objects to control lifetime (intel#2516) [SYCL][CUDA] Image Basic Test (intel#1970) [SYCL] Align get_info<info::device::version>() with the SYCL spec (intel#2507) [Driver][SYCL] Correct optimization disabling option for gen (intel#2622) [SYCL][LIT] Add deleter func for test in buffer.cpp to avoid potential SegFault (intel#2616) [SYCL] Remove half type alias causing name conflicts (intel#2624) [BuildBot] OpenCL CPU/FPGAEMU driver uplift (intel#2620) [SYCL][Doc] Add overview of kernel-program caching (intel#2514) [SYCL] Remove two-input sub-group shuffles (intel#2614) [SYCL] Add support for new spelling of FPGA kernel attribute scheduler_target_fmax_mhz (intel#2618) [SYCL] Align image class constructors with the SYCL spec (intel#2603) [SYCL] Remove tests migrated to llvm-test-suite (intel#2611) [SYCL][NFC] Fix dependency for SYCL add_sycl_executable macro (intel#2613) [SYCL][PI][L0] Update environment variables from LEVEL0 to LEVEL_ZERO (intel#2612) [SYCL] Add KernelNameTypeVisitor validation check (intel#2596) ...
The order of global constructor/destructor invocation is undefined. At the same time, user may desire to make objects of SYCL classes global. Since there're lots of global objects inside SYCL runtime (scheduler and program manager, to name a few), they need to be wrapped with some trivial class/structure. The object of that structure is allocated on heap and deallocated in
__attribute__((destructor))
/DllMain
, which guarantees, that SYCL global objects will be destroyed after user objects.