-
Notifications
You must be signed in to change notification settings - Fork 6.8k
fix race when temp space is used in copy & fix instance overwrite in g2c #8867
Conversation
https://github.com/apache/incubator-mxnet/blob/master/src/kvstore/comm.h#L187, |
@@ -76,7 +76,7 @@ | |||
|
|||
# construct the module | |||
# map the ctx_group attribute to the context assignment | |||
group2ctxs={'dev1':mx.cpu(), 'dev2':[mx.gpu(i) for i in range(num_gpus)]} | |||
group2ctxs={'dev1':[mx.cpu()]*num_gpus, 'dev2':[mx.gpu(i) for i in range(num_gpus)]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is just for better understandability.
src/ndarray/ndarray.cc
Outdated
@@ -454,7 +454,8 @@ inline void CopyFromToDnsImpl(const NDArray& from, const NDArray& to, RunContext | |||
|
|||
// Make a copy of an NDArray based on storage type | |||
template<typename from_xpu, typename to_xpu> | |||
void CopyFromToImpl(const NDArray& from, const NDArray& to, RunContext rctx) { | |||
void CopyFromToImpl(const NDArray& from, const NDArray& to, | |||
RunContext rctx, std::vector<Resource> requested) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const reference for the vector?
src/ndarray/ndarray.cc
Outdated
@@ -518,43 +515,57 @@ void CopyFromTo(const NDArray& from, const NDArray& to, int priority) { | |||
CHECK(from.shape().ndim() != 0) | |||
<< "source operands have zero dimension shape"; | |||
// important: callback must always capture by value | |||
int a = from.ctx().dev_mask(); | |||
const auto from_ctx = from.ctx(); | |||
int a = from_ctx.dev_mask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : const a and const b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using auto for simple types
std::vector<Engine::VarHandle> mutable_vars(1, to.var()); | ||
|
||
std::vector<Resource> requested; | ||
if (a == gpu::kDevMask && from_stype != to_stype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if b is on GPU ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accordding to original codes,
- std::vector<Resource> requested;
- if (is_same<from_xpu, mshadow::gpu>::value && from_stype != to_stype) {
- requested.push_back(ResourceManager::Get()->Request(from_ctx,
- ResourceRequest(ResourceRequest::kTempSpace)));
- }
Seems that whether temp space is used is irrelevant with the context of b ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. No need to request temp space if cast_storage happens on CPU.
tests/python/unittest/test_module.py
Outdated
|
||
check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}) | ||
check_module_ctx_group([mx.cpu(0)], {'dev1': mx.cpu(1), 'dev2': mx.cpu(2)}, [mx.cpu(1), mx.cpu(2)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think explicitly mentioning optional arg names (grad_ctxs) when passing optional args is a good practice since API may change in the future
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
…g2c (apache#8867) * fix race when temp space is used in copy * fix instance overwrite in g2c * example of g2c * address comments
Description
var of temp space should be in mutable_vars in engine.
[{}] * ctx_len
is actually one dict.cc @eric-haibin-lin
Checklist
Essentials
make lint
)Changes
Comments