DeviceContext Split, test=develop #23737

Shixiaowei02 · 2020-04-10T13:52:41Z

本提交是预测多流的一部分，为兼容训练，部分修改以 #22853 为基础。
1、抽象 CUDAContext / CUDAStream 类，将其从 CUDADeviceContext 拆分。
2、创建 CUDA 流时可选择优先级。
3、增加 std::thread_local 属性的 CUDAContext，使运行时上下文可绑定线程。
为向后兼容，当前修改在获取流时进行了一次 std::unordered_map::count() 检索。

还考虑过下列修改方式实现预测多流：
1、从 OperatorBase::Run() 入手重构，使上下文绑定执行器而非线程；#23068 为这种修改的一次尝试。这种方式全面彻底，但还涉及到底层 60+ 个模块，改动量较大，所以被延迟。
2、创建 CUDAStreamPool 统一管理 CUDA 流，但这样增加了一个全局变量、不易封装；并且目前的流与句柄绑定，提前初始化会分配多余资源，所以未采用。
3、仅改动 TensorRT 子图，实现部分多流。但这种方式仍无法避免考虑一些全局问题。

NHZlX

LGTM

paddle/fluid/platform/stream/cuda_stream.cc

paddle/fluid/platform/stream/cuda_stream.h

Superjomn

LGTM

* cherry-pick of DeviceContext Split, test=develop (#23737) * New feature: thread local allocator, test=develop (#23989) * add the thread_local_allocator, test=develop * refactor the thread_local_allocator, test=develop * provides option setting strategy, test=develop * add boost dependency to cuda_stream, test=develop * declare the stream::Priority as enum class, test=develop * deal with PADDLE_ENFORCE_CUDA_SUCCESS macro in pr #23816

* supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop

supports thread-binding stream, test=develop

d49bd57

Shixiaowei02 force-pushed the dev/cuda_ctx branch 7 times, most recently from c85cfb8 to 4d1e00b Compare April 13, 2020 11:22

avoid using thread_local variables in dtor, test=develop

3f3ca41

Shixiaowei02 force-pushed the dev/cuda_ctx branch from 7a37c33 to 3f3ca41 Compare April 13, 2020 12:02

Shixiaowei02 requested review from Superjomn and NHZlX April 14, 2020 02:17

NHZlX previously approved these changes Apr 14, 2020

View reviewed changes

Superjomn requested changes Apr 14, 2020

View reviewed changes

paddle/fluid/platform/stream/cuda_stream.cc Outdated Show resolved Hide resolved

paddle/fluid/platform/stream/cuda_stream.h Outdated Show resolved Hide resolved

Shixiaowei02 dismissed NHZlX’s stale review via 7a52034 April 14, 2020 09:26

Shixiaowei02 force-pushed the dev/cuda_ctx branch from dbabad3 to 9a4ebbd Compare April 14, 2020 09:28

Shixiaowei02 requested a review from Superjomn April 14, 2020 09:30

modify the stream priority enum, test=develop

9f64567

Shixiaowei02 force-pushed the dev/cuda_ctx branch from 9a4ebbd to 9f64567 Compare April 14, 2020 09:37

Superjomn approved these changes Apr 17, 2020

View reviewed changes

Shixiaowei02 merged commit 2d01cc8 into PaddlePaddle:develop Apr 17, 2020

Shixiaowei02 mentioned this pull request Apr 21, 2020

New feature: thread local allocator, test=develop #23989

Merged

Shixiaowei02 added the Multi-stream label Apr 21, 2020

Shixiaowei02 added a commit to Shixiaowei02/Paddle that referenced this pull request Apr 22, 2020

cherry-pick of DeviceContext Split, test=develop (PaddlePaddle#23737)

e6d9995

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeviceContext Split, test=develop #23737

DeviceContext Split, test=develop #23737

Shixiaowei02 commented Apr 10, 2020 •

edited

Loading

NHZlX left a comment

Superjomn left a comment

DeviceContext Split, test=develop #23737

DeviceContext Split, test=develop #23737

Conversation

Shixiaowei02 commented Apr 10, 2020 • edited Loading

NHZlX left a comment

Choose a reason for hiding this comment

Superjomn left a comment

Choose a reason for hiding this comment

Shixiaowei02 commented Apr 10, 2020 •

edited

Loading