Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add federated learning parameter server(fl-ps) mode #42682

Merged
merged 41 commits into from
Jun 2, 2022

Conversation

ziyoujiyi
Copy link
Contributor

PR types

New features

PR changes

Others

Describe

  1. add federated learning model
  2. add fl-ps train.py
  3. add fl-ps split program pass
  4. add fl-ps unittest
  5. support N cpus + N cpus
  6. test ok

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

fuyinno4
fuyinno4 previously approved these changes May 23, 2022
@paddle-bot-old
Copy link

Sorry to inform you that 2873622's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

if (type[0] == 'f') { // float
const auto& feasign = ins_vec[i].GetFloatData();
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

self._pull_all_dense(scopes, send_ctx, dense_map)
fleet.util.barrier()
fleet.util.barrier()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this with guanqun?

@@ -90,7 +94,6 @@ def _build_programs(self):

class GeoPsProgramBuilder(PsProgramBuilder): # 仅 CPU 模式
def __init__(self, pass_ctx):
logger.info("start building geo-ps program")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check with caibei ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zmxdream zmxdream self-requested a review May 31, 2022 23:52
program=None,
scope=None,
is_infer=False,
debug=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块找xinxuan确认下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -308,5 +333,5 @@ Scope* HeterPipelineTrainer::GetWorkerScope(int thread_id) {
}

} // end namespace framework
} // end namespace paddle
} // namespace paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不应该去掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -2365,7 +2359,7 @@ def _start_heter_trainer(self,
fetch_info=fetch_info,
print_period=print_period)

trainer._set_infer(is_infer)
trainer._set_infer(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcode?

if not self.context['use_ps_gpu']:
self._pull_all_dense(scopes, send_ctx, dense_map)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if xx mode:
push dense param
barrier
pull dense
elif xx1 mode:
push dense param1
barrier
pull dense1
else ...

@@ -0,0 +1,49 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright (c) 2022

@@ -0,0 +1,53 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright (c) 2022

@@ -0,0 +1,51 @@
#!/bin/bash

# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright (c) 2022

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuyinno4 fuyinno4 merged commit d999049 into PaddlePaddle:develop Jun 2, 2022
fuyou765 pushed a commit to fuyou765/Paddle that referenced this pull request Jun 7, 2022
* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants