Do not reset default stream for StreamSafeCUDAAllocator #42149

From00 · 2022-04-23T07:52:30Z

PR types

Bug fixes

PR changes

Others

Describe

The default stream in StreamSafeCUDAAllocator will be set when the CUDADeviceContext created.

CUDADeviceContext::CUDADeviceContext(CUDAPlace place) : phi::GPUContext(place) {
  phi::GPUContext::PartialInitWithoutAllocator();
  cuda_stream_.reset(new stream::CUDAStream(phi::GPUContext::stream(), place));
  auto& instance = memory::allocation::AllocatorFacade::Instance();
  instance.SetDefaultStream(place, phi::GPUContext::stream());
  workspace_.reset(new phi::DnnWorkspaceHandle(
      instance.GetAllocator(place).get(), stream()));
}

Normally, the DeviceContextPool is a global singleton and one Place only correspond to one DeviceContext. However, to support multi-stream scheduling, standalone executor creates two extra DeviceContextPools for H2D and D2H stream in StreamAnalyzer, which make one Place correspond to multiple DeviceContext and unexpectedly reset the default stream in runtime.

class StreamAnalyzer {
 public:
  explicit StreamAnalyzer(const platform::Place& place)
      : place_(place), d2h_ctx_pool_({place}), h2d_ctx_pool_({place}) {}
  
  ...

  platform::Place place_;
  platform::DeviceContextPool d2h_ctx_pool_;
  platform::DeviceContextPool h2d_ctx_pool_;
  std::map<size_t, std::shared_ptr<platform::DeviceEvent>> var_id2event_;
};

To avoid this behavior, this PR disables changing default stream after initially setting.

paddle-bot-old · 2022-04-23T07:52:36Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu

LGTM

…PaddlePaddle#42149)

* [cherry-pick] Support cinn_launch op in standalone executor (#42046) * Support cinn_launch OP in standalone executor * Remove some redundant code * [cherry-pick] Do not reset default stream for StreamSafeCUDAAllocator (#42149)

* bind elementwise_mod_op_xpu *test=kunlun * add more supported dtypes and UTs *test=kunlun * fix datatype error * add op to in xpu1_op_list * Update Mac cmake version >=3.15 (#41456) * Update Mac cmake version >=3.15 * notest;read test1 notest;read test2 notest;read test3 * fix inference link error * fix inference link error * fix windows link error * fix cmake_policy * fix build big size * Add paddle::variant and replace paddle::any (#42139) * add variant and replace any * split attribute * disable unittest failed in eager CI in temporary (#42101) * test=py3-eager * test=py3-eager * test=py3-eager * combine graph_table and feature_table in graph_engine (#42134) * extract sub-graph * graph-engine merging * fix * fix * fix heter-ps config * test performance * test performance * test performance * test * test * update bfs * change cmake * test * test gpu speed * gpu_graph_engine optimization * add dsm sample method * add graph_neighbor_sample_v2 * Add graph_neighbor_sample_v2 * fix for loop * add cpu sample interface * fix kernel judgement * add ssd layer to graph_engine * fix allocation * fix syntax error * fix syntax error * fix pscore class * fix * change index settings * recover test * recover test * fix spelling * recover * fix * move cudamemcpy after cuda stream sync * fix linking problem * remove comment * add cpu test * test * add cpu test * change comment * combine feature table and graph table * test * test * pybind * test * test * test * test * pybind * pybind * fix cmake * pybind * fix * fix * add pybind * add pybind Co-authored-by: DesmonDay <[email protected]> * [CustomDevice] add eager mode support (#42034) * fix FlattenContiguousRangeOpConverter out dim error (#42087) * fix FlattenContiguousRangeOpConverter out dim error * update code * fix python3.10 compile bug on windows (#42140) * Optimize dygraph GetExpectedKernelType perf (#42154) * opt dygraph scheduling * revert part impl * fix incorrect usages of std::move and other compile errors (#41045) * fix bug of std::move and others * fix an compile error in debug mode * fix wrong copy assignment operator Signed-off-by: tiancaishaonvjituizi <[email protected]> * reformat Signed-off-by: tiancaishaonvjituizi <[email protected]> * reformat Signed-off-by: tiancaishaonvjituizi <[email protected]> * fix ArrayRef constructor following llvm * fix format * fix conflict with master * fix variant compile error (#42203) * [Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136) * [Eager] Remove redundancy code, fix fp16 case (#42169) * [Eager] Support div(scalar) in eager mode (#42148) * [Eager] Support div scalar in eager mode * Updated and remove debug logs * Remove list, use 'or' directly * Remove useless statement * fix recompute (#42128) * fix recompute * modify return * add LICENSE in wheel dist-info package (#42187) * replace any by variant in infermeta (#42181) * 【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743) * Add infermeta for ChannelShuffle * Create channel_shuffle_grad_kernel.h * Create channel_shuffle_kernel.h * Create channel_shuffle_sig.cc * Create channel_shuffle_op.cc ChannelShuffle算子的描述 * Create channel_shuffle_kernel_impl.h ChannelShuffle核函数的实现 * Create channel_shuffle_grad_kernel_impl.h ChannelShuffle反向核函数的实现 * Add kernel register of channel shuffle and grad 注册ChannelShuffle及其反向的核函数 * add nn.functional.channel_shuffle * add nn.ChannelShuffle * Create test_channel_shuffle.py * Update example of ChannelShuffle in vision.py * Update test_channel_shuffle.py * 修改channel_shuffle核函数的实现位置 * 修正代码格式 * 删除多余空格 * 完善channel_shuffle的错误检查 * Update unary.cc * Update channel_shuffle_op.cc * Update test_channel_shuffle.py * Update unary.cc * add channel_shuffle * Update test_channel_shuffle.py * Update vision.py * 调整代码格式 * Update channel_shuffle_sig.cc * 更新ChannelShuffle的文档 * 更新channel_shuffle的文档 * remove ChannelShuffleOpArgumentMapping * add ChannelShuffleGradInferMeta * Update channel_shuffle_op.cc * 调整channel_shuffle及其梯度的核函数的位置 * Do not reset default stream for StreamSafeCUDAAllocator (#42149) * remove redundant computation in Categorical.probs (#42114) * Downloading data for test_analyzer_vit_ocr (#42041) * Change server URL * update config * add test to parallel UT rule * add checksum to ensure files are downloaded * change downloading target * reuse existing variable * change target directory * fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604) * Update scope_guard; test=document_fix * gradients; test=document_fix * gradients; test=document_fix * name_scope; test=document_fix * cpu_places; test=document_fix * WeightNormParamAttr; test=document_fix * cuda_places; test=document_fix * load_program_state; test=document_fix * device_guard; test=document_fix * device_guard; test=document_fix * ParamAttr; test=document_fix * scale; test=document_fix * scale; test=document_fix * update code example；test=document_fix Co-authored-by: Chen Long <[email protected]> * fix datatype error add op to in xpu1_op_list *test=kunlun * fix elementwise_mod op path error *test=kunlun * fix elementwise_mod UT error *test=kunlun * fix datatype error add op to in xpu1_op_list *test=kunlun add op to in xpu1_op_list fix elementwise_mod op path error *test=kunlun fix elementwise_mod UT error *test=kunlun Co-authored-by: tianshuo78520a <[email protected]> Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: pangyoki <[email protected]> Co-authored-by: seemingwang <[email protected]> Co-authored-by: DesmonDay <[email protected]> Co-authored-by: ronnywang <[email protected]> Co-authored-by: baoachun <[email protected]> Co-authored-by: Zhou Wei <[email protected]> Co-authored-by: tiancaishaonvjituizi <[email protected]> Co-authored-by: Weilong Wu <[email protected]> Co-authored-by: Roc <[email protected]> Co-authored-by: BrilliantYuKaimin <[email protected]> Co-authored-by: Ruibiao Chen <[email protected]> Co-authored-by: Feiyu Chan <[email protected]> Co-authored-by: Sławomir Siwek <[email protected]> Co-authored-by: Yilingyelu <[email protected]> Co-authored-by: Chen Long <[email protected]>

…)" This reverts commit 6553a9d.

Do not reset default stream for StreamSafeCUDAAllocator

3bed176

zhiqiu approved these changes Apr 25, 2022

View reviewed changes

From00 merged commit 6553a9d into PaddlePaddle:develop Apr 25, 2022

From00 added a commit to From00/Paddle that referenced this pull request Apr 26, 2022

[cherry-pick] Do not reset default stream for StreamSafeCUDAAllocator (…

6cc3053

…PaddlePaddle#42149)

From00 mentioned this pull request Apr 26, 2022

Cherry pick for standalone executor #42281

Merged

betterpig added a commit that referenced this pull request May 7, 2022

Revert "Do not reset default stream for StreamSafeCUDAAllocator (#42149…

c296793

…)" This reverts commit 6553a9d.

betterpig mentioned this pull request May 7, 2022

【no merge】Revert "Do not reset default stream for StreamSafeCUDAAllocator" #42559

Closed

betterpig mentioned this pull request May 16, 2022

【CI】make some test run with old executor in specified windows server #42777

Merged

From00 deleted the do-not-reset-default-stream-for-stream-safe-cuda-allocator branch April 5, 2023 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not reset default stream for StreamSafeCUDAAllocator #42149

Do not reset default stream for StreamSafeCUDAAllocator #42149

From00 commented Apr 23, 2022 •

edited

Loading

paddle-bot-old bot commented Apr 23, 2022

zhiqiu left a comment

Do not reset default stream for StreamSafeCUDAAllocator #42149

Do not reset default stream for StreamSafeCUDAAllocator #42149

Conversation

From00 commented Apr 23, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 23, 2022

zhiqiu left a comment

Choose a reason for hiding this comment

From00 commented Apr 23, 2022 •

edited

Loading