add dataset 'adult' for vertical_fl #423

qbc2016 · 2022-11-08T12:21:51Z

add one dataset 'adult' and rewrite a general dataloader for vertical fl

…or 'credit'

rayrayraykk

Looks good to me. Please see the inline comments. Thanks!

rayrayraykk · 2022-11-09T04:33:24Z

federatedscope/core/auxiliaries/data_builder.py

@@ -28,7 +28,7 @@
        'subreddit', 'synthetic', 'ciao', 'epinions', '.*?vertical_fl_data.*?',
        '.*?movielens.*?', '.*?cikmcup.*?', 'graph_multi_domain.*?', 'cora',
        'citeseer', 'pubmed', 'dblp_conf', 'dblp_org', 'csbm.*?', 'fb15k-237',
-        'wn18'
+        'wn18', 'adult_fl'


No need for '_fl'

Since this dataset will also be used for the xgb model, I wonder if it will cause contradiction after merging the xgb pr.

IMO, we can keep one adult dataset but use different preprocessors/splitters according to the methods.

yep, I'll add some choices in the .yaml file to decide whether to preprocess the data or not.

rayrayraykk · 2022-11-09T08:53:56Z

federatedscope/vertical_fl/dataset/adult.py

+logger = logging.getLogger(__name__)
+
+
+class Adult:


Please provide a docstring here, thx!

We should add the source of Adult dataset (e.g., link, reference) here.

rayrayraykk · 2022-11-09T08:56:33Z

federatedscope/vertical_fl/worker/vertical_client.py

@@ -101,7 +101,7 @@ def callback_funcs_for_encryped_gradient_v(self, message: Message):
        en_u, en_v_B = message.content
        input_x = self.sample_data(index=self.batch_index)
        en_v_A = en_u * input_x
-        en_v = np.concatenate([en_v_A, en_v_B], axis=-1)
+        en_v = np.concatenate([en_v_B, en_v_A], axis=-1)


@xieyxclack Please help to check this change, thx!

Client A used to hold the label, for convenience, I let client B hold the label, so here two values should be switched.

It is a little tricky for me... maybe we should add a cfg item to specify which client own labels? (add a TODO later)

Since for xgb model, we are able to set the number of clients, and partition the data, for convenience, in the data partition program, I always let the last client hold the label. Maybe, later for the final version, we can add a cfg item as you suggest?(I will add a TODO)

rayrayraykk · 2022-11-09T08:57:21Z

federatedscope/vertical_fl/dataset/adult.py

+        for file in self.raw_file:
+            if self._check_existence(file):
+                logger.info(file + " files already exist")
+            # print("Files already downloaded and verified")


Remove this line.

rayrayraykk · 2022-11-09T08:59:15Z

federatedscope/vertical_fl/dataset/adult.py

+        test_set = combined_set[train_set.shape[0]:]
+        return train_set, test_set
+
+    # standardization


This comment might be wrong. I guess it should be #normalization.

Yes, I'll fix it. By the way, as I said above, this dataset may also be used for xgb model and do not need to do normalization or standardization there, I wonder if it is suitable to process the data here.

You could use an arg to control the process of data.

if cfg.data.args['is_norm']=True: self.norm() ... else: ...

rayrayraykk · 2022-11-09T08:59:23Z

federatedscope/vertical_fl/dataset/adult.py

+        _range = np.max(data) - np.min(data)
+        return (data - np.min(data)) / _range
+
+    # normalization


Look above.

rayrayraykk · 2022-11-09T09:00:21Z

federatedscope/vertical_fl/dataset/adult.py

+
+        self._get_data()
+
+    base_folder = 'adult'


Move the class attribute behind __init__.

xieyxclack

LGTM, please refer to the inline comments, thx!

xieyxclack · 2022-11-09T12:16:01Z

federatedscope/core/auxiliaries/data_builder.py

@@ -28,7 +28,7 @@
        'subreddit', 'synthetic', 'ciao', 'epinions', '.*?vertical_fl_data.*?',
        '.*?movielens.*?', '.*?cikmcup.*?', 'graph_multi_domain.*?', 'cora',
        'citeseer', 'pubmed', 'dblp_conf', 'dblp_org', 'csbm.*?', 'fb15k-237',
-        'wn18'
+        'wn18', 'adult_fl'


IMO, we can keep one adult dataset but use different preprocessors/splitters according to the methods.

xieyxclack · 2022-11-09T12:17:24Z

federatedscope/vertical_fl/dataloader/dataloader.py

        return data, config
    else:
-        raise ValueError('You must provide the data file')
+        if generate:


Maybe use if ... elif....else here

xieyxclack · 2022-11-09T12:19:08Z

federatedscope/vertical_fl/dataset/adult.py

+logger = logging.getLogger(__name__)
+
+
+class Adult:


We should add the source of Adult dataset (e.g., link, reference) here.

xieyxclack · 2022-11-09T12:21:01Z

federatedscope/vertical_fl/vertical_on_adult.yaml

@@ -0,0 +1,39 @@
+use_gpu: False
+


Maybe remove the blank line to make it consistent with other yaml files.

xieyxclack · 2022-11-09T12:22:44Z

federatedscope/vertical_fl/worker/vertical_client.py

@@ -101,7 +101,7 @@ def callback_funcs_for_encryped_gradient_v(self, message: Message):
        en_u, en_v_B = message.content
        input_x = self.sample_data(index=self.batch_index)
        en_v_A = en_u * input_x
-        en_v = np.concatenate([en_v_A, en_v_B], axis=-1)
+        en_v = np.concatenate([en_v_B, en_v_A], axis=-1)


It is a little tricky for me... maybe we should add a cfg item to specify which client own labels? (add a TODO later)

…bel form 0 to -1 or not

xieyxclack

LGTM

qbc2016 added 10 commits October 17, 2022 20:29

20221017

b677cc0

Merge branch 'master' of https://github.com/alibaba/FederatedScope

eb8f7db

refine master

f1e3b99

Merge branch 'master' of https://github.com/alibaba/FederatedScope

89f103e

fix yaml, need fix givemesomecredit

60cda5b

temperory files, need further repairation, may work for 'adult', no f…

0ef0135

…or 'credit'

dataset 'adult' for vertical fl

228fa72

delete redundant

78c0d4b

fix typo

5db4bbf

minor changes

1667e6d

xieyxclack requested review from xieyxclack and rayrayraykk November 9, 2022 02:52

xieyxclack added the enhancement New feature or request label Nov 9, 2022

rayrayraykk reviewed Nov 9, 2022

View reviewed changes

xieyxclack reviewed Nov 9, 2022

View reviewed changes

qbc2016 added 3 commits November 10, 2022 11:40

modified according to the comments

60949b9

add a parameter 'model' to dataset to decide whether to change the la…

d44121d

…bel form 0 to -1 or not

minor changes

96614a3

xieyxclack approved these changes Nov 10, 2022

View reviewed changes

rayrayraykk merged commit 16b5c25 into alibaba:master Nov 10, 2022

qbc2016 deleted the dev_vertical_data branch November 10, 2022 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dataset 'adult' for vertical_fl #423

add dataset 'adult' for vertical_fl #423

qbc2016 commented Nov 8, 2022

rayrayraykk left a comment

rayrayraykk Nov 9, 2022

qbc2016 Nov 9, 2022

xieyxclack Nov 9, 2022

qbc2016 Nov 9, 2022

rayrayraykk Nov 9, 2022

xieyxclack Nov 9, 2022

rayrayraykk Nov 9, 2022

qbc2016 Nov 9, 2022

xieyxclack Nov 9, 2022

qbc2016 Nov 9, 2022

rayrayraykk Nov 9, 2022

rayrayraykk Nov 9, 2022

qbc2016 Nov 9, 2022

rayrayraykk Nov 9, 2022

qbc2016 Nov 9, 2022

rayrayraykk Nov 9, 2022

rayrayraykk Nov 9, 2022

xieyxclack left a comment

xieyxclack Nov 9, 2022

xieyxclack Nov 9, 2022

xieyxclack Nov 9, 2022

xieyxclack Nov 9, 2022

xieyxclack Nov 9, 2022

xieyxclack left a comment

add dataset 'adult' for vertical_fl #423

add dataset 'adult' for vertical_fl #423

Conversation

qbc2016 commented Nov 8, 2022

rayrayraykk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xieyxclack left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xieyxclack left a comment

Choose a reason for hiding this comment