Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine multi thread cpu parallel exe #11406

Merged

Conversation

chengduoZH
Copy link
Contributor

No description provided.

@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from b5a1c35 to 606a73b Compare June 20, 2018 08:20
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 606a73b to 5c3ece4 Compare June 20, 2018 08:26
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 2 times, most recently from 24e1890 to 80bb2cf Compare June 26, 2018 13:18
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 80bb2cf to 053ecd6 Compare June 26, 2018 13:19
@chengduoZH chengduoZH requested a review from reyoung June 28, 2018 12:33
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 2 times, most recently from 04798dc to 5343ac2 Compare July 4, 2018 14:55
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 5343ac2 to 985461b Compare July 4, 2018 15:00
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 7 times, most recently from 3c8a759 to 623f412 Compare July 9, 2018 12:39
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 623f412 to 962e8c9 Compare July 10, 2018 10:03
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 043864e to 2af3613 Compare July 12, 2018 08:15
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 2 times, most recently from b9241e1 to e223397 Compare July 12, 2018 12:27
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from e223397 to dcce1ff Compare July 12, 2018 12:43
@@ -32,6 +32,8 @@ struct BuildStrategy {
ReduceStrategy reduce_{ReduceStrategy::kAllReduce};
GradientScaleStrategy gradient_scale_{GradientScaleStrategy::kCoeffNumDevice};

bool share_parameter_between_cards_{false};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to share parameter between careds when use cuda?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need another flag instead of ReduceStrategy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to share parameter between cards when use cuda?

If share_parameter_between_cards_ is true, use_cuda_ must be false and build_strategy.reduce_ must be ReduceStrategy::kReduce. There is a checking:
https://github.com/PaddlePaddle/Paddle/pull/11406/files#diff-564dec854cf4f37015001783f71e06cbR76

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... Why we need the new flag rather than just use build_strategy.reduce_ ?

The data fields should be ORTHOGONAL.

>>> p = p - 0.001 * g
"""
OpRole = core.op_proto_and_checker_maker.OpRole
self._current_role = OpRole.Optimize
self._op_role_var = [var.name if isinstance(var, Variable) else var]
self._op_role_var = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to store parameters and gradients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

42629173-1d2bf772-8605-11e8-8d0b-f20980cf73e5

In this case, fc_0.b_0's gradience name has been changed, so if we still use gradientfc_0.b_0@GRAD to decide this sgd in which device, we will encounter errors.
So, in this PR, I use grad but not GradVarName(params[0]) to get it's belong device.
https://github.com/PaddlePaddle/Paddle/pull/11406/files/7cf836f4ba4353d8ba4247ca09e904098a43edf5#diff-06c27dc69562c2f50b53409969b0a9b5R435

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, cool

@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 4 times, most recently from 21a39eb to 1cfdd10 Compare July 13, 2018 05:38
@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch 5 times, most recently from 5ec119a to 55cf9ee Compare July 13, 2018 05:54
}
return -1;
auto param_grad = boost::get<std::vector<std::string>>(
op.GetNullableAttr(OpProtoAndCheckerMaker::OpRoleVarAttrName()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why here need to GetNullableAttr? rather than GetAttr()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be GetAttr() too here.

@chengduoZH chengduoZH force-pushed the refine_multi_thread_CPU_Parallel_exe branch from 55cf9ee to 7c19f38 Compare July 13, 2018 06:53
Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

@chengduoZH chengduoZH merged commit 86b0a72 into PaddlePaddle:develop Jul 13, 2018
kuke pushed a commit to kuke/Paddle that referenced this pull request Aug 25, 2018
* refine multi-thread CPU Parallel exe

* refine multi thread CPU Parallel exe

* Refine CPU version for ParallelExecutor

* add share_parameter_between_cards_

* Fix ParallelExecutor bug

* Fix unit test

* Fix parameter opt balance

* Fix with opti (param->grad)

* Add grad to op var

* Remove shard_param_between_cards
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants