-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeviceContext Split, test=develop #23737
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Shixiaowei02
force-pushed
the
dev/cuda_ctx
branch
7 times, most recently
from
April 13, 2020 11:22
c85cfb8
to
4d1e00b
Compare
Shixiaowei02
force-pushed
the
dev/cuda_ctx
branch
from
April 13, 2020 12:02
7a37c33
to
3f3ca41
Compare
NHZlX
previously approved these changes
Apr 14, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Superjomn
requested changes
Apr 14, 2020
Shixiaowei02
force-pushed
the
dev/cuda_ctx
branch
from
April 14, 2020 09:28
dbabad3
to
9a4ebbd
Compare
Shixiaowei02
force-pushed
the
dev/cuda_ctx
branch
from
April 14, 2020 09:37
9a4ebbd
to
9f64567
Compare
Superjomn
approved these changes
Apr 17, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Shixiaowei02
added a commit
to Shixiaowei02/Paddle
that referenced
this pull request
Apr 22, 2020
Shixiaowei02
added a commit
that referenced
this pull request
Apr 23, 2020
* cherry-pick of DeviceContext Split, test=develop (#23737) * New feature: thread local allocator, test=develop (#23989) * add the thread_local_allocator, test=develop * refactor the thread_local_allocator, test=develop * provides option setting strategy, test=develop * add boost dependency to cuda_stream, test=develop * declare the stream::Priority as enum class, test=develop * deal with PADDLE_ENFORCE_CUDA_SUCCESS macro in pr #23816
Shixiaowei02
added a commit
to Shixiaowei02/Paddle
that referenced
this pull request
May 11, 2020
* supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
本提交是预测多流的一部分,为兼容训练,部分修改以 #22853 为基础。
1、抽象
CUDAContext
/CUDAStream
类,将其从CUDADeviceContext
拆分。2、创建 CUDA 流时可选择优先级。
3、增加
std::thread_local
属性的CUDAContext
,使运行时上下文可绑定线程。为向后兼容,当前修改在获取流时进行了一次
std::unordered_map::count()
检索。还考虑过下列修改方式实现预测多流:
1、从
OperatorBase::Run()
入手重构,使上下文绑定执行器而非线程;#23068 为这种修改的一次尝试。这种方式全面彻底,但还涉及到底层 60+ 个模块,改动量较大,所以被延迟。2、创建
CUDAStreamPool
统一管理 CUDA 流,但这样增加了一个全局变量、不易封装;并且目前的流与句柄绑定,提前初始化会分配多余资源,所以未采用。3、仅改动 TensorRT 子图,实现部分多流。但这种方式仍无法避免考虑一些全局问题。