Make objectives work with vertical distributed and federated learning #9002

rongou · 2023-03-30T19:06:02Z

Went through all the objectives and made sure they work with both column-split distributed training and vertical federated learning.

…d-objectives

rongou · 2023-03-30T19:06:23Z

@trivialfis @hcho3

…d-objectives

trivialfis

I'm not quite convinced that only one of the participants can have access to labels. We will have to maintain lots of these conditions in the future.

trivialfis · 2023-03-31T05:21:11Z

src/objective/quantile_obj.cu

@@ -35,7 +35,7 @@ class QuantileRegression : public ObjFunction {
  bst_target_t Targets(MetaInfo const& info) const override {
    auto const& alpha = param_.quantile_alpha.Get();
    CHECK_EQ(alpha.size(), alpha_.Size()) << "The objective is not yet configured.";
-    CHECK_EQ(info.labels.Shape(1), 1) << "Multi-target is not yet supported by the quantile loss.";
+    CHECK_LE(info.labels.Shape(1), 1) << "Multi-target is not yet supported by the quantile loss.";


Under which case this can be 0? Would be great if we can make sure it's greater or equal to 1 and only allow the first dimension (rows) to be zero.

…d-objectives

rongou · 2023-03-31T17:08:22Z

I'm not quite convinced that only one of the participants can have access to labels. We will have to maintain lots of these conditions in the future.

We have to make some assumptions for vertical federated learning, and assuming labels are only available on worker 0 is the least restrictive. We can look at the two scenarios:

More than one worker (or even all workers) have access to labels. With the current implementation, users can do some preprocessing to remove labels from all the workers other than 0 and proceed to training.
Only one worker has access to labels. We probably cannot share labels with other workers because of privacy concerns, or even prohibited by laws or regulations (GDPR, HIPPA, etc.). In this case, arranging the worker with the labels as 0 would make training possible.

So assuming labels on worker 0 would support a broader set of use cases.

trivialfis

Initial review. I'm a bit concerned about the generality of this dispatching. Is there a way to hide it in the collective instead of exposing it to the main algorithms?

Looking for ideas, @RAMitchell @hcho3 .

tests/cpp/plugin/test_federated_learner.cc

tests/cpp/test_learner.cc

trivialfis · 2023-03-31T18:51:06Z

src/objective/quantile_obj.cu

@@ -167,8 +170,10 @@ class QuantileRegression : public ObjFunction {
    common::Mean(ctx_, *base_score, &temp);
    double meanq = temp(0) * sw;

-    collective::Allreduce<collective::Operation::kSum>(&meanq, 1);
-    collective::Allreduce<collective::Operation::kSum>(&sw, 1);
+    if (info.IsRowSplit()) {


Maybe we should make a wrapper function for Allreduce so that newcomers to the code base can understand these conditions? I think we need some separation for communication logic and machine learning algorithm implementation. I'm open to any idea, at the moment it's quite difficult for a "machine learning person" to add a new algorithm and make sure it works correctly with different distributed system requirements. I can see that these conditions are necessary for metrics as well since they require labels. Is there a way to make this less intrusive by abstracting the logic under collective instead of exposing it to the machine learning part?

The problem is Allreduce is a low level primitive, but column-wise split and vertical federated learning are higher level semantic changes, so they don't always map to each other. At the level of the communicator, I don't think we can determine whether we should skip a particular Allreduce call, even if we know what distributed/federated environment we are in.

Maybe we can come up with a clever mechanism to make this cleaner. Perhaps as a follow up?

trivialfis · 2023-03-31T18:55:21Z

src/objective/adaptive.h

+                              leaf_values.size() * sizeof(bst_float), 0);
+        auto i = 0;
+        auto& tree = *p_tree;
+        for (auto nid = 0; nid < tree.NumNodes(); nid++) {


The mapping between GetNodes and nid is not clear to me. There's an update leaf value function called by but host and device after calculating the quantiles and it has access to leaf value, consider moving the logic there.

Also, please consider abstracting this logic away from the main algorithm. I find it difficult to consider all possible combinations of conditions.

…d-objectives

trivialfis

Maybe we can come up with a clever mechanism to make this cleaner. Perhaps as a follow up?

Yep, looking forward to your solution.

rongou added 5 commits March 29, 2023 10:11

add some tests for colsplit objectives

334fb39

Merge remote-tracking branch 'upstream/master' into vertical-federate…

adc89ff

…d-objectives

test all objectives for colsplit

a1f84d1

test objectives for vertical federated learning

332a2cb

Merge remote-tracking branch 'upstream/master' into vertical-federate…

e9262d2

…d-objectives

rongou mentioned this pull request Mar 30, 2023

Vertical Federated Learning RFC #8424

Open

Merge remote-tracking branch 'upstream/master' into vertical-federate…

f252dbe

…d-objectives

trivialfis reviewed Mar 31, 2023

View reviewed changes

rongou added 2 commits March 31, 2023 09:52

Merge remote-tracking branch 'upstream/master' into vertical-federate…

bc7755c

…d-objectives

more strict label shapes

9b54707

fix long line

a7d2622

trivialfis reviewed Mar 31, 2023

View reviewed changes

rongou added 2 commits March 31, 2023 16:56

Merge remote-tracking branch 'upstream/master' into vertical-federate…

f98099b

…d-objectives

address review feedback

fc53748

trivialfis approved these changes Apr 3, 2023

View reviewed changes

trivialfis merged commit 15e073c into dmlc:master Apr 3, 2023

rongou deleted the vertical-federated-objectives branch September 25, 2023 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make objectives work with vertical distributed and federated learning #9002

Make objectives work with vertical distributed and federated learning #9002

rongou commented Mar 30, 2023

rongou commented Mar 30, 2023

trivialfis left a comment

trivialfis Mar 31, 2023

rongou Mar 31, 2023

rongou commented Mar 31, 2023

trivialfis left a comment

trivialfis Mar 31, 2023

rongou Apr 1, 2023

trivialfis Mar 31, 2023 •

edited

Loading

rongou Apr 1, 2023

trivialfis left a comment

Make objectives work with vertical distributed and federated learning #9002

Make objectives work with vertical distributed and federated learning #9002

Conversation

rongou commented Mar 30, 2023

rongou commented Mar 30, 2023

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Mar 31, 2023

Choose a reason for hiding this comment

rongou Mar 31, 2023

Choose a reason for hiding this comment

rongou commented Mar 31, 2023

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Mar 31, 2023

Choose a reason for hiding this comment

rongou Apr 1, 2023

Choose a reason for hiding this comment

trivialfis Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

rongou Apr 1, 2023

Choose a reason for hiding this comment

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Mar 31, 2023 •

edited

Loading