-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiNodeChainList with self branching #102
Conversation
@@ -104,6 +110,7 @@ def send(x, communicator, rank, tag=0): | |||
|
|||
""" | |||
chainer.utils.experimental('chainermn.functions.send') | |||
assert rank != communicator.rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add an assertion comment like "Cannot send to the local process itself" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or should it be an internal error and should not happen?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (use ValueError
instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -136,6 +143,7 @@ def recv(communicator, rank, delegate_variable=None, tag=0, device=-1): | |||
|
|||
""" | |||
chainer.utils.experimental('chainermn.functions.recv') | |||
assert rank != communicator.rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertion comment
chainermn/link.py
Outdated
# the same edge more than twice. | ||
delegate_variable = None | ||
# Prevent "double-backwarding," i.e., backprop | ||
# the same edge more than twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than once ?
(only == 1 is good && >= 2 is not good, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Don't we say "more than twice" in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of ratio, there's no big difference between "> 2x" and ">= 2x" , so both of them can be translated into a Japanese word "2倍以上". But in this case, I don't think it applies.
"... should be called exactly once" would be less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for suggestion. Fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
self._rank_inouts.append((rank_in, rank_out)) | ||
|
||
def __call__(self, *inputs): | ||
comm_queue = queue.Queue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking comm_queue
is empty at the end of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I fixed it.
chainermn/link.py
Outdated
@@ -221,6 +221,8 @@ def __call__(self, *inputs): | |||
x, self._comm, | |||
rank=_rank_out) | |||
|
|||
assert comm_queue.empty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this happens with user errors, so exceptions should be used
LGTM |
In #98 ,
MultiNodeChainList
does not allow to send / recv to the same process (see https://github.com/chainer/chainermn/blob/master/chainermn/link.py#L129).This pull request extends
MultiNodeChainList
a little bit to allow it.See examples in
test_link.py
:In these examples, we can specify
comm.rank
(the rank on which the models would be instantiated) forrank_in
orrank_out
.