-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor splitter&transform; Modify some data related config; Add external dataset. #33
Conversation
federatedscope/attack/example_attack_config/CRA_fedavg_convnet2_on_femnist.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a discussion about this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see the detailed comments
|
||
|
||
def get_transform(config, package): | ||
transform_funcs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not override the default values by None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good but some minor issues need to be discussed/resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job. Plz see the inline comments. After merging this PR, I will integrate more audio datasets from wenet based on the new dataset customization style.
@@ -118,6 +118,238 @@ def _generate_data(client_num=5, | |||
return data, config | |||
|
|||
|
|||
def load_external_data(config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function appears to do a variety of things (with several internal sub-functions). It would be nice to have a document at the beginning that explains its input and output and the logical work flow.
targets = label_to_index(targets) | ||
data = pad_sequence(data).transpose(0, | ||
1)[:, :raw_args['max_len'], :] | ||
data_list.append([(x, y) for x, y in zip(data, targets)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the IterableDataset, we may need a more general and principled load function. I will add and test more speech data based on IterableDataset
and we can improve this in future discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will add a TODO
tag here.
dataset_func = getattr(import_module('torch_geometric.datasets'), name) | ||
raise NotImplementedError | ||
|
||
load_data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a global variable with the name DATA_LOAD_FUNCS
, and be put in a more forward position.
if config.data.splitter == 'lda': | ||
from federatedscope.core.splitters.generic import LDASplitter | ||
splitter = LDASplitter(client_num, **args) | ||
# graph splitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make these graph splitters applicable to other data formats? That is, they can use the same splitting algorithms but with a bit different pre/post processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will add and modify these splitters in the next PR.
import federatedscope.register as register | ||
|
||
|
||
def get_transform(config, package): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple documentation can be added to tell users the input/output of the transform
.
cfg.data.splitter_args = [] # args for splitter, eg. [{'alpha': 0.5}] | ||
cfg.data.transform = [] # transform for x, eg. [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]] | ||
cfg.data.target_transform = [] # target_transform for y, use as above | ||
cfg.data.pre_transform = [] # pre_transform for `torch_geometric` dataset, use as above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the three args, how about naming them as feat_transform
, label_transform
, and graph_pre_transform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't agree. For the torchvision
and torch_geometric
dataset, we will use **transform_funcs
as their args directly.
@@ -0,0 +1,3 @@ | |||
from __future__ import absolute_import | |||
from __future__ import print_function | |||
from __future__ import division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline here. Check whether the linter is used.
…versations above.
'val': DataLoader() | ||
} | ||
} | ||
config: `CN` from `federatedscope/core/configs/config.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing for users why we take a config
and return a config
. It would be better by naming it modifed_cfg
with more informative explanation.
Other modifications look good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved
Co-authored-by: yuexiang.xyx <[email protected]> Update `tests/run.py` for Jenkins server (alibaba#4) just a workaround Feature/synchronize (alibaba#3) sync with the master branch of our original gitlab Feature/config refactor (alibaba#5) refactored configuration-related code modify README; minor fix (alibaba#6) Updated README fix gan cra loss_batch-> loss_task bug improved the environments set-up guidance improved the environments set-up guidance improved the environments set-up guidance Fix setup requirements. Update required python version to 3.9. updated auto-doc component according to the latest changes [Feature] Add dropout and log training metric. (alibaba#11) * Add dropout option for CNN and NLP model; Add training metric to logs. * allow users to determine whether to conduct evaluation on a specific split * Enable metric in global eval for users to determine whether to conduct evaluation on a specific split. * fix minor bug when importing nlp loss * Replace and remove `validate` with `evaluate(target_data_split_name=split)` to keep code clean. enabled the log file name valid in windows environment (alibaba#13) * enabled the log file name valid in windows environment update readme (alibaba#15) * update README added a demo for black-box optimization (alibaba#14) - added a demo for black-box optimization - enabled installation with cuda10 [Bugfix] fixed the invalid logger set-up if the logging is used before we call setup_logger (alibaba#17) * fixed the invalid logger set-up if the `logging` is used before we call `setup_logger` Change source of `download_url` from our own and fix `README` (alibaba#20) * Change source of `download_url` from our own `utils.py` and fix `README.md`. add logo (alibaba#26) - add logo - add more icons modify grpc_comm according to official tutorial (alibaba#25) fix path issue fix wrong logger usage reformatted Communication efficiency optimization (alibaba#19) * minor fixed for distributed mode * For the communication efficiency: dynamic type selection in gRPC servicer; transformer & parser Refactored the logger by reducing its redundancy and fixed some minor issues (alibaba#29) * Reducing the redundancy of the logger Update test_mf.py modify the unit test of mf task Refactor splitter&transform; Modify some data related config; Add external dataset. (alibaba#33) [Feature] FedEx (alibaba#37) [Feature] FedEx (alibaba#37) [Hotfix] print the missing ``Final`` results (alibaba#41) * hotfix for the missing ``Final`` results print Add pre-trained transformers as NLP model. TODO:@ZHEN, please fix online aggregator when the device is not specific. Add a example for transformers. Fix url. (alibaba#46) - added the local training baseline - enabled each client has its own early-stopper formatted by linter formatted by linter not use early_stopper in non-local mode bugfix for the cast "sample_client_num = -1" added global training mode via a proxy client that holds all data Fix un-consistent device for the PIA test added local fine-tuning before local evaluation linter format bugfix for fedex update README (alibaba#49) Feature/attack doc (alibaba#50) * improved the doc for attack module added API comments (alibaba#52) Fix docs about graph. (alibaba#51) Add api ref for mf task and context (alibaba#53) * add mf api reference and modify README.md typos fix Fix minor bugs Timeout strategy and minimal received number (alibaba#36) * For async: timeout strategy and minimal received number modify api reference (alibaba#56) update doc of core (alibaba#57) Add datasets from hugging face. Formatted and fix minor bugs. Add datasets and scripts for openml. Modify the example `yaml` of openml datasets. Add materials (paper lists, tutorials) (alibaba#60) * add FL paper list Add paper lists (alibaba#61) * add FL paper list fixed some missing API reference in fs.core (alibaba#54) As the title says. update release version (alibaba#64) Update graph paper list. (alibaba#65) Add paper list for FedHPO (alibaba#67) * added paper list for fedhpo rename and modify some val Add paper list for FedRec (alibaba#68) add paper list for FedRec added pfl paper list (alibaba#72) added pfl paper list hotfix for transformers to avoid import error updated pfl paper list (alibaba#73) updated pfl paper list fix url in dblp_new.py (alibaba#76) update README update debug squad model update update update
torchvision
and torchtext(datasets fromtorchaudio
are coming soon).config.py
which is not used.