Refactor splitter&transform; Modify some data related config; Add external dataset. #33

rayrayraykk · 2022-04-20T12:31:11Z

Refactor splitter;
Modify some data-related config, like transforms and args;
Add some external datasets from torchvision and torchtext(datasets from torchaudio are coming soon).
Remove old config.py which is not used.
Refactor transforms.
Add a uni-test with MNIST and IMDB for the external dataset.

federatedscope/attack/example_attack_config/CRA_fedavg_convnet2_on_femnist.yaml

federatedscope/core/auxiliaries/splitter_builder.py

federatedscope/cv/dataset/leaf_cv.py

joneswong

We need a discussion about this pr

DavdGao

please see the detailed comments

federatedscope/core/auxiliaries/transform_builder.py

federatedscope/core/configs/cfg_data.py

… args.

federatedscope/core/auxiliaries/data_builder.py

joneswong · 2022-04-25T03:15:25Z

federatedscope/core/auxiliaries/transform_builder.py

+
+
+def get_transform(config, package):
+    transform_funcs = {


We should not override the default values by None

tests/test_external_dataset.py

joneswong

looks good but some minor issues need to be discussed/resolved

yxdyc

Good job. Plz see the inline comments. After merging this PR, I will integrate more audio datasets from wenet based on the new dataset customization style.

yxdyc · 2022-04-25T03:30:26Z

federatedscope/core/auxiliaries/data_builder.py

@@ -118,6 +118,238 @@ def _generate_data(client_num=5,
    return data, config


+def load_external_data(config=None):


This function appears to do a variety of things (with several internal sub-functions). It would be nice to have a document at the beginning that explains its input and output and the logical work flow.

yxdyc · 2022-04-25T03:35:30Z

federatedscope/core/auxiliaries/data_builder.py

+                targets = label_to_index(targets)
+            data = pad_sequence(data).transpose(0,
+                                                1)[:, :raw_args['max_len'], :]
+            data_list.append([(x, y) for x, y in zip(data, targets)])


For the IterableDataset, we may need a more general and principled load function. I will add and test more speech data based on IterableDataset and we can improve this in future discussions.

Thanks, I will add a TODO tag here.

yxdyc · 2022-04-25T03:37:22Z

federatedscope/core/auxiliaries/data_builder.py

+        dataset_func = getattr(import_module('torch_geometric.datasets'), name)
+        raise NotImplementedError
+
+    load_data = {


It can be a global variable with the name DATA_LOAD_FUNCS, and be put in a more forward position.

yxdyc · 2022-04-25T03:40:43Z

federatedscope/core/auxiliaries/splitter_builder.py

+    if config.data.splitter == 'lda':
+        from federatedscope.core.splitters.generic import LDASplitter
+        splitter = LDASplitter(client_num, **args)
+    # graph splitter


Is it possible to make these graph splitters applicable to other data formats? That is, they can use the same splitting algorithms but with a bit different pre/post processes.

Sure, I will add and modify these splitters in the next PR.

yxdyc · 2022-04-25T03:42:35Z

federatedscope/core/auxiliaries/transform_builder.py

+import federatedscope.register as register
+
+
+def get_transform(config, package):


Simple documentation can be added to tell users the input/output of the transform.

yxdyc · 2022-04-25T03:45:10Z

federatedscope/core/configs/cfg_data.py

+    cfg.data.splitter_args = []  # args for splitter, eg. [{'alpha': 0.5}]
+    cfg.data.transform = []  # transform for x, eg. [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]]
+    cfg.data.target_transform = []  # target_transform for y, use as above
+    cfg.data.pre_transform = []  # pre_transform for `torch_geometric` dataset, use as above


for the three args, how about naming them as feat_transform, label_transform, and graph_pre_transform?

I can't agree. For the torchvision and torch_geometric dataset, we will use **transform_funcs as their args directly.

yxdyc · 2022-04-25T03:46:57Z

federatedscope/core/splitters/__init__.py

@@ -0,0 +1,3 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division


No newline here. Check whether the linter is used.

…versations above.

yxdyc · 2022-04-25T07:05:56Z

federatedscope/core/auxiliaries/data_builder.py

+                                    'val': DataLoader()
+                                }
+                            }
+        config: `CN` from `federatedscope/core/configs/config.py`


It is confusing for users why we take a configand return a config. It would be better by naming it modifed_cfg with more informative explanation.

Other modifications look good to me

joneswong

approved

@ZHEN

Co-authored-by: yuexiang.xyx <[email protected]> Update `tests/run.py` for Jenkins server (alibaba#4) just a workaround Feature/synchronize (alibaba#3) sync with the master branch of our original gitlab Feature/config refactor (alibaba#5) refactored configuration-related code modify README; minor fix (alibaba#6) Updated README fix gan cra loss_batch-> loss_task bug improved the environments set-up guidance improved the environments set-up guidance improved the environments set-up guidance Fix setup requirements. Update required python version to 3.9. updated auto-doc component according to the latest changes [Feature] Add dropout and log training metric. (alibaba#11) * Add dropout option for CNN and NLP model; Add training metric to logs. * allow users to determine whether to conduct evaluation on a specific split * Enable metric in global eval for users to determine whether to conduct evaluation on a specific split. * fix minor bug when importing nlp loss * Replace and remove `validate` with `evaluate(target_data_split_name=split)` to keep code clean. enabled the log file name valid in windows environment (alibaba#13) * enabled the log file name valid in windows environment update readme (alibaba#15) * update README added a demo for black-box optimization (alibaba#14) - added a demo for black-box optimization - enabled installation with cuda10 [Bugfix] fixed the invalid logger set-up if the logging is used before we call setup_logger (alibaba#17) * fixed the invalid logger set-up if the `logging` is used before we call `setup_logger` Change source of `download_url` from our own and fix `README` (alibaba#20) * Change source of `download_url` from our own `utils.py` and fix `README.md`. add logo (alibaba#26) - add logo - add more icons modify grpc_comm according to official tutorial (alibaba#25) fix path issue fix wrong logger usage reformatted Communication efficiency optimization (alibaba#19) * minor fixed for distributed mode * For the communication efficiency: dynamic type selection in gRPC servicer; transformer & parser Refactored the logger by reducing its redundancy and fixed some minor issues (alibaba#29) * Reducing the redundancy of the logger Update test_mf.py modify the unit test of mf task Refactor splitter&transform; Modify some data related config; Add external dataset. (alibaba#33) [Feature] FedEx (alibaba#37) [Feature] FedEx (alibaba#37) [Hotfix] print the missing ``Final`` results (alibaba#41) * hotfix for the missing ``Final`` results print Add pre-trained transformers as NLP model. TODO:@ZHEN, please fix online aggregator when the device is not specific. Add a example for transformers. Fix url. (alibaba#46) - added the local training baseline - enabled each client has its own early-stopper formatted by linter formatted by linter not use early_stopper in non-local mode bugfix for the cast "sample_client_num = -1" added global training mode via a proxy client that holds all data Fix un-consistent device for the PIA test added local fine-tuning before local evaluation linter format bugfix for fedex update README (alibaba#49) Feature/attack doc (alibaba#50) * improved the doc for attack module added API comments (alibaba#52) Fix docs about graph. (alibaba#51) Add api ref for mf task and context (alibaba#53) * add mf api reference and modify README.md typos fix Fix minor bugs Timeout strategy and minimal received number (alibaba#36) * For async: timeout strategy and minimal received number modify api reference (alibaba#56) update doc of core (alibaba#57) Add datasets from hugging face. Formatted and fix minor bugs. Add datasets and scripts for openml. Modify the example `yaml` of openml datasets. Add materials (paper lists, tutorials) (alibaba#60) * add FL paper list Add paper lists (alibaba#61) * add FL paper list fixed some missing API reference in fs.core (alibaba#54) As the title says. update release version (alibaba#64) Update graph paper list. (alibaba#65) Add paper list for FedHPO (alibaba#67) * added paper list for fedhpo rename and modify some val Add paper list for FedRec (alibaba#68) add paper list for FedRec added pfl paper list (alibaba#72) added pfl paper list hotfix for transformers to avoid import error updated pfl paper list (alibaba#73) updated pfl paper list fix url in dblp_new.py (alibaba#76) update README update debug squad model update update update

…ernal dataset. (alibaba#33)

rayrayraykk requested review from joneswong, xieyxclack, yxdyc and DavdGao April 20, 2022 12:31

rayrayraykk added 15 commits April 21, 2022 13:35

Re-orgnized splitter.

b06b5c1

Add external datasets

803ee51

Refactoring transforms and add external datasets.

7b4ced0

Fix missing transform

0cb5462

Refactoring graph splitter

306c24f

Refactoring graph splitter and fix minor bugs.

5e5aed0

Refactoring splitter builder and add register for splitter

b514830

Fix minor bug caused by yacs

c4347ac

Fix minor bug caused by yacs

3b5c6a0

Fix minor bug when transform is NOT tuple.

44ef320

Fix minor bug when transform is NOT tuple.

bfd94ee

Del bare except.

584b752

Fix logging.

ac4a139

Refactor transforms and add unitest for external dataset.

ff45328

Merge branch 'transform'

9e15b86

rayrayraykk added the enhancement New feature or request label Apr 21, 2022

rayrayraykk added 2 commits April 21, 2022 17:03

minor change to restart uni-test

1336be0

Fix minor bug.

05ae251

rayrayraykk changed the title ~~Refactor splitter; Modify some data related config; Add external dataset.~~ Refactor splitter&transform; Modify some data related config; Add external dataset. Apr 21, 2022

rayrayraykk added 2 commits April 21, 2022 18:14

Make external dataset test faster.

3e01bd9

Fix dependency and delay import.

188d769

joneswong reviewed Apr 22, 2022

View reviewed changes

federatedscope/attack/example_attack_config/CRA_fedavg_convnet2_on_femnist.yaml Outdated Show resolved Hide resolved

joneswong reviewed Apr 22, 2022

View reviewed changes

federatedscope/core/auxiliaries/splitter_builder.py Outdated Show resolved Hide resolved

joneswong reviewed Apr 22, 2022

View reviewed changes

federatedscope/cv/dataset/leaf_cv.py Show resolved Hide resolved

joneswong reviewed Apr 22, 2022

View reviewed changes

DavdGao reviewed Apr 22, 2022

View reviewed changes

federatedscope/core/auxiliaries/transform_builder.py Show resolved Hide resolved

federatedscope/core/configs/cfg_data.py Outdated Show resolved Hide resolved

rayrayraykk added 7 commits April 24, 2022 14:47

Specify the input of transform_builder and splitter_builder and their…

229ed1a

… args.

merge upstream

967340f

Minor change to keep consistent with PR #29.

4909232

Add some comments for usage of args of config.data.

b1945fd

Fix minor bugs and add the datasets from torchtext.

c0f0ef9

Update the application env and add transformer_register.

e845878

Lower the threshold of external dataset.

bce3b50

joneswong reviewed Apr 25, 2022

View reviewed changes

federatedscope/core/auxiliaries/data_builder.py Show resolved Hide resolved

joneswong reviewed Apr 25, 2022

View reviewed changes

tests/test_external_dataset.py Show resolved Hide resolved

joneswong reviewed Apr 25, 2022

View reviewed changes

yxdyc reviewed Apr 25, 2022

View reviewed changes

Add guide to some functions and fix minor issues according to the con…

0f8b1c8

…versations above.

yxdyc reviewed Apr 25, 2022

View reviewed changes

Add description from modified_config.

30e6945

joneswong approved these changes Apr 25, 2022

View reviewed changes

joneswong merged commit 2bc22c0 into alibaba:master Apr 25, 2022

AnthonyXuan pushed a commit to AnthonyXuan/FederatedScope that referenced this pull request Aug 10, 2023

Refactor splitter&transform; Modify some data related config; Add ext…

91e9155

…ernal dataset. (alibaba#33)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor splitter&transform; Modify some data related config; Add external dataset. #33

Refactor splitter&transform; Modify some data related config; Add external dataset. #33

rayrayraykk commented Apr 20, 2022 •

edited

Loading

joneswong left a comment

DavdGao left a comment

joneswong Apr 25, 2022

joneswong left a comment

yxdyc left a comment

yxdyc Apr 25, 2022

yxdyc Apr 25, 2022

rayrayraykk Apr 25, 2022

yxdyc Apr 25, 2022

yxdyc Apr 25, 2022

rayrayraykk Apr 25, 2022

yxdyc Apr 25, 2022

yxdyc Apr 25, 2022

rayrayraykk Apr 25, 2022

yxdyc Apr 25, 2022

yxdyc Apr 25, 2022

joneswong left a comment

		@@ -118,6 +118,238 @@ def _generate_data(client_num=5,
		return data, config


		def load_external_data(config=None):

		import federatedscope.register as register


		def get_transform(config, package):

Refactor splitter&transform; Modify some data related config; Add external dataset. #33

Refactor splitter&transform; Modify some data related config; Add external dataset. #33

Conversation

rayrayraykk commented Apr 20, 2022 • edited Loading

joneswong left a comment

Choose a reason for hiding this comment

DavdGao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joneswong left a comment

Choose a reason for hiding this comment

yxdyc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joneswong left a comment

Choose a reason for hiding this comment

rayrayraykk commented Apr 20, 2022 •

edited

Loading