Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiNodeChainList with self branching #102

Merged
merged 13 commits into from
Aug 10, 2017
Merged

Conversation

levelfour
Copy link
Contributor

@levelfour levelfour commented Aug 6, 2017

In #98 , MultiNodeChainList does not allow to send / recv to the same process (see https://github.com/chainer/chainermn/blob/master/chainermn/link.py#L129).
This pull request extends MultiNodeChainList a little bit to allow it.
See examples in test_link.py:

class BranchParent2(chainermn.MultiNodeChainList):
    def __init__(self, size, comm, rank_children):
        super(BranchParent2, self).__init__(comm=comm)
        ranks = [comm.rank] + rank_children
        self.add_link(BranchSubA(size), rank_in=None, rank_out=ranks)
        self.add_link(BranchSubA(size), rank_in=comm.rank, rank_out=comm.rank)
        self.add_link(BranchSubB(size), rank_in=ranks, rank_out=None)


class BranchParent3(chainermn.MultiNodeChainList):
    def __init__(self, size, comm, rank_children):
        super(BranchParent3, self).__init__(comm=comm)
        ranks = rank_children + [comm.rank]
        self.add_link(BranchSubA(size), rank_in=None, rank_out=ranks)
        self.add_link(BranchSubA(size), rank_in=comm.rank, rank_out=comm.rank)
        self.add_link(BranchSubB(size), rank_in=ranks, rank_out=None)


class BranchParent4(chainermn.MultiNodeChainList):
    def __init__(self, size, comm, rank_children):
        super(BranchParent4, self).__init__(comm=comm)
        ranks = rank_children + [comm.rank]
        ranks = ranks[1:] + ranks[0:1]
        self.add_link(BranchSubA(size), rank_in=None, rank_out=ranks)
        self.add_link(BranchSubA(size), rank_in=comm.rank, rank_out=comm.rank)
        self.add_link(BranchSubB(size), rank_in=ranks, rank_out=None)

In these examples, we can specify comm.rank (the rank on which the models would be instantiated) for rank_in or rank_out.

@@ -104,6 +110,7 @@ def send(x, communicator, rank, tag=0):

"""
chainer.utils.experimental('chainermn.functions.send')
assert rank != communicator.rank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add an assertion comment like "Cannot send to the local process itself" or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or should it be an internal error and should not happen?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (use ValueError instead).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -136,6 +143,7 @@ def recv(communicator, rank, delegate_variable=None, tag=0, device=-1):

"""
chainer.utils.experimental('chainermn.functions.recv')
assert rank != communicator.rank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertion comment

# the same edge more than twice.
delegate_variable = None
# Prevent "double-backwarding," i.e., backprop
# the same edge more than twice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than once ?
(only == 1 is good && >= 2 is not good, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Don't we say "more than twice" in this case?

Copy link
Member

@keisukefukuda keisukefukuda Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of ratio, there's no big difference between "> 2x" and ">= 2x" , so both of them can be translated into a Japanese word "2倍以上". But in this case, I don't think it applies.
"... should be called exactly once" would be less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for suggestion. Fixed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


self._rank_inouts.append((rank_in, rank_out))

def __call__(self, *inputs):
comm_queue = queue.Queue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking comm_queue is empty at the end of this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I fixed it.

@@ -221,6 +221,8 @@ def __call__(self, *inputs):
x, self._comm,
rank=_rank_out)

assert comm_queue.empty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this happens with user errors, so exceptions should be used

@iwiwi
Copy link
Contributor

iwiwi commented Aug 10, 2017

LGTM

@iwiwi iwiwi merged commit 98595c2 into chainer:master Aug 10, 2017
@levelfour levelfour deleted the self-branch branch August 11, 2017 15:34
@iwiwi iwiwi added this to the v1.0.0 milestone Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants