Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make objectives work with vertical distributed and federated learning #9002

Merged
merged 11 commits into from
Apr 3, 2023

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Mar 30, 2023

Went through all the objectives and made sure they work with both column-split distributed training and vertical federated learning.

@rongou
Copy link
Contributor Author

rongou commented Mar 30, 2023

@trivialfis @hcho3

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite convinced that only one of the participants can have access to labels. We will have to maintain lots of these conditions in the future.

@@ -35,7 +35,7 @@ class QuantileRegression : public ObjFunction {
bst_target_t Targets(MetaInfo const& info) const override {
auto const& alpha = param_.quantile_alpha.Get();
CHECK_EQ(alpha.size(), alpha_.Size()) << "The objective is not yet configured.";
CHECK_EQ(info.labels.Shape(1), 1) << "Multi-target is not yet supported by the quantile loss.";
CHECK_LE(info.labels.Shape(1), 1) << "Multi-target is not yet supported by the quantile loss.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which case this can be 0? Would be great if we can make sure it's greater or equal to 1 and only allow the first dimension (rows) to be zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@rongou
Copy link
Contributor Author

rongou commented Mar 31, 2023

I'm not quite convinced that only one of the participants can have access to labels. We will have to maintain lots of these conditions in the future.

We have to make some assumptions for vertical federated learning, and assuming labels are only available on worker 0 is the least restrictive. We can look at the two scenarios:

  • More than one worker (or even all workers) have access to labels. With the current implementation, users can do some preprocessing to remove labels from all the workers other than 0 and proceed to training.
  • Only one worker has access to labels. We probably cannot share labels with other workers because of privacy concerns, or even prohibited by laws or regulations (GDPR, HIPPA, etc.). In this case, arranging the worker with the labels as 0 would make training possible.

So assuming labels on worker 0 would support a broader set of use cases.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. I'm a bit concerned about the generality of this dispatching. Is there a way to hide it in the collective instead of exposing it to the main algorithms?

Looking for ideas, @RAMitchell @hcho3 .

tests/cpp/plugin/test_federated_learner.cc Outdated Show resolved Hide resolved
tests/cpp/test_learner.cc Outdated Show resolved Hide resolved
@@ -167,8 +170,10 @@ class QuantileRegression : public ObjFunction {
common::Mean(ctx_, *base_score, &temp);
double meanq = temp(0) * sw;

collective::Allreduce<collective::Operation::kSum>(&meanq, 1);
collective::Allreduce<collective::Operation::kSum>(&sw, 1);
if (info.IsRowSplit()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make a wrapper function for Allreduce so that newcomers to the code base can understand these conditions? I think we need some separation for communication logic and machine learning algorithm implementation. I'm open to any idea, at the moment it's quite difficult for a "machine learning person" to add a new algorithm and make sure it works correctly with different distributed system requirements. I can see that these conditions are necessary for metrics as well since they require labels. Is there a way to make this less intrusive by abstracting the logic under collective instead of exposing it to the machine learning part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is Allreduce is a low level primitive, but column-wise split and vertical federated learning are higher level semantic changes, so they don't always map to each other. At the level of the communicator, I don't think we can determine whether we should skip a particular Allreduce call, even if we know what distributed/federated environment we are in.

Maybe we can come up with a clever mechanism to make this cleaner. Perhaps as a follow up?

leaf_values.size() * sizeof(bst_float), 0);
auto i = 0;
auto& tree = *p_tree;
for (auto nid = 0; nid < tree.NumNodes(); nid++) {
Copy link
Member

@trivialfis trivialfis Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping between GetNodes and nid is not clear to me. There's an update leaf value function called by but host and device after calculating the quantiles and it has access to leaf value, consider moving the logic there.

Also, please consider abstracting this logic away from the main algorithm. I find it difficult to consider all possible combinations of conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can come up with a clever mechanism to make this cleaner. Perhaps as a follow up?

Yep, looking forward to your solution.

@trivialfis trivialfis merged commit 15e073c into dmlc:master Apr 3, 2023
@rongou rongou deleted the vertical-federated-objectives branch September 25, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants