Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a wider range of dynamically initialized models for MultiNodeOptimizer #148

Merged
merged 7 commits into from
Dec 15, 2017

Conversation

shu65
Copy link
Member

@shu65 shu65 commented Dec 11, 2017

When a new layer is added to a model dynamically, its parameters needs to be sent all nodes.
But, the current MultiNodeOptimizer only sends model parameters first once.
Therefore, I fixed MultinodeOptimizer to send parameters all nodes when a model is changed.

@iwiwi iwiwi self-requested a review December 13, 2017 04:25
@iwiwi iwiwi self-assigned this Dec 13, 2017
return True
return False
else:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us make the case analysis as easy as possible. Here, how about using "early return" as follows (for details, please refer to the book 'The Art of Readable Code'):

if len(previous_params) != len(self.target_params):
  return True

for param1, param2 in zip(self.target_params, previous_params):
  ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will fix it.

@iwiwi
Copy link
Contributor

iwiwi commented Dec 13, 2017

I assume that it is fine, but, for safety, I would like to see some empirical evidence that this change does not affect the performance. I'm concerning the cost for many string comparison.

@shu65
Copy link
Member Author

shu65 commented Dec 13, 2017

Ok, I will check its performance.

@shu65
Copy link
Member Author

shu65 commented Dec 14, 2017

I measured the computing time of is_change which is the function checks if a model has not been modified.
I used ResNet50 as model and repeatedly measured it 100 times. I used 2 nodes in MN-1 for the evaluation. The result is following.

                 Mean    Median  Min     Max
                 (sec.)  (sec.)  (sec.)  (sec.)
is_change only   0.0002  0.0002  0.0002  0.0003

In addition, I also measured the computing time of is_change and allreduce of PureNcclCommunicator with ResNet50.
The result is following.

The computing time of is_change + allreduce

                 Mean    Median  Min     Max
                 (sec.)  (sec.)  (sec.)  (sec.)
1 GPU            0.0045  0.0042  0.0041  0.0076
2 GPUs           0.0137  0.0136  0.0129  0.0171
4 GPUs           0.0174  0.0174  0.0170  0.0185
8 GPUs           0.0263  0.0258  0.0253  0.0326
16 GPUs          0.0324  0.0322  0.0315  0.0350

The computing time of allreduce only

                 Mean    Median  Min     Max
                 (sec.)  (sec.)  (sec.)  (sec.)
1 GPU            0.0038  0.0038  0.0038  0.0041
2 GPUs           0.0134  0.0134  0.0126  0.0163
4 GPUs           0.0172  0.0171  0.0167  0.0187
8 GPUs           0.0256  0.0254  0.0251  0.0306
16 GPUs          0.0319  0.0317  0.0313  0.0339

For the results, the impact of the function is very small.

@shu65 shu65 changed the title [WIP] Fix update bug in MultiNodeOptimizer Fix update bug in MultiNodeOptimizer Dec 15, 2017
@iwiwi
Copy link
Contributor

iwiwi commented Dec 15, 2017

Great, thank you for the detailed evaluation! LGTM

@iwiwi iwiwi merged commit 1018524 into master Dec 15, 2017
@iwiwi iwiwi changed the title Fix update bug in MultiNodeOptimizer Support a wider range of dynamically initialized models for MultiNodeOptimizer Dec 15, 2017
@iwiwi iwiwi deleted the test_for_dynamic_network branch December 15, 2017 09:00
@iwiwi iwiwi added this to the v1.1.0 milestone Dec 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants