Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceContext Split, test=develop #23737

Merged
merged 3 commits into from
Apr 17, 2020

Conversation

Shixiaowei02
Copy link
Contributor

@Shixiaowei02 Shixiaowei02 commented Apr 10, 2020

本提交是预测多流的一部分,为兼容训练,部分修改以 #22853 为基础。
1、抽象 CUDAContext / CUDAStream 类,将其从 CUDADeviceContext 拆分。
2、创建 CUDA 流时可选择优先级。
3、增加 std::thread_local 属性的 CUDAContext,使运行时上下文可绑定线程。
为向后兼容,当前修改在获取流时进行了一次 std::unordered_map::count() 检索。


还考虑过下列修改方式实现预测多流:
1、从 OperatorBase::Run() 入手重构,使上下文绑定执行器而非线程;#23068 为这种修改的一次尝试。这种方式全面彻底,但还涉及到底层 60+ 个模块,改动量较大,所以被延迟。
2、创建 CUDAStreamPool 统一管理 CUDA 流,但这样增加了一个全局变量、不易封装;并且目前的流与句柄绑定,提前初始化会分配多余资源,所以未采用。
3、仅改动 TensorRT 子图,实现部分多流。但这种方式仍无法避免考虑一些全局问题。

@Shixiaowei02 Shixiaowei02 force-pushed the dev/cuda_ctx branch 7 times, most recently from c85cfb8 to 4d1e00b Compare April 13, 2020 11:22
NHZlX
NHZlX previously approved these changes Apr 14, 2020
Copy link
Contributor

@NHZlX NHZlX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paddle/fluid/platform/stream/cuda_stream.cc Outdated Show resolved Hide resolved
paddle/fluid/platform/stream/cuda_stream.h Outdated Show resolved Hide resolved
Copy link
Contributor

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shixiaowei02 Shixiaowei02 merged commit 2d01cc8 into PaddlePaddle:develop Apr 17, 2020
Shixiaowei02 added a commit to Shixiaowei02/Paddle that referenced this pull request Apr 22, 2020
Shixiaowei02 added a commit that referenced this pull request Apr 23, 2020
* cherry-pick of DeviceContext Split, test=develop (#23737)

* New feature: thread local allocator, test=develop (#23989)

* add the thread_local_allocator, test=develop

* refactor the thread_local_allocator, test=develop

* provides option setting strategy, test=develop

* add boost dependency to cuda_stream, test=develop

* declare the stream::Priority as enum class, test=develop

* deal with PADDLE_ENFORCE_CUDA_SUCCESS macro in pr #23816
Shixiaowei02 added a commit to Shixiaowei02/Paddle that referenced this pull request May 11, 2020
* supports thread-binding stream, test=develop

* avoid using thread_local variables in dtor, test=develop

* modify the stream priority enum, test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants