-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored the logger by reducing its redundancy and fixed some minor issues #29
Conversation
- save the raw results in another file and compress it - in standalone mode, print the trainer meta-info once - in standalone mode, do not print the local eval results from client-side - fix the missing global eval results - minor fix for the client id print and logger setup
…rove # Conflicts: # federatedscope/core/auxiliaries/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see the detailed comments.
) # e.g., sub_exp_20220411030524 | ||
outdir = os.path.join(cfg.outdir, "sub_exp" + | ||
datetime.now().strftime('_%Y%m%d%H%M%S') | ||
) # e.g., sub_exp_20220411030524 | ||
while os.path.exists(cfg.outdir): | ||
time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need time.sleep(1)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the case that in a very short time, some programs running in parallel create a directory with the same name
cfg.outdir, | ||
"sub_exp" + datetime.now().strftime('_%Y%m%d%H%M%S')) | ||
cfg.outdir = outdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like cfg.outdir
is initialized here by exp/sub_exp__%Y%m%d%H%M%S
. Please make sure the config file is stored in the same dir (cfg is stored in the freeze
function in config.py).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will call the logger_setup at the beginning and the following usages are based on the same outdir, see the new PR.
local_updated_models (list): each element is ooxx. | ||
Returns: | ||
b_local_dissimilarity (dict): the measurements. | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please elaborate the metric of dissimilarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference added
…rove # Conflicts: # federatedscope/config.py # federatedscope/core/auxiliaries/utils.py # federatedscope/core/fed_runner.py # federatedscope/core/worker/client.py # federatedscope/core/worker/server.py # federatedscope/gfl/fedsageplus/worker.py # federatedscope/gfl/gcflplus/worker.py # federatedscope/vertical_fl/worker/vertical_server.py
federatedscope/main.py
Outdated
@@ -1,7 +1,7 @@ | |||
import os | |||
import sys | |||
|
|||
DEV_MODE = False # simplify the federatedscope re-setup everytime we change the source codes of federatedscope | |||
DEV_MODE = True # simplify the federatedscope re-setup everytime we change the source codes of federatedscope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better keep it False
.
tests/test_rec_IG_opt_attack.py
Outdated
self.assertIsNotNone(data) | ||
|
||
Fed_runner = FedRunner(data=data, | ||
server_class=get_server_cls(global_cfg), | ||
server_class=get_server_cls(init_cfg), | ||
client_class=get_client_cls(global_cfg), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be init_cfg
.
- save the raw results in another file and compress it - in standalone mode, print the trainer meta-info once - in standalone mode, do not print the local eval results from client-side - fix the missing global eval results - minor fix for the client id print and logger setup
…deratedScope into feature/logging_improve
Minor fix for the unit test
@@ -30,6 +30,10 @@ def __init__(self, | |||
if self.mode == 'standalone': | |||
self.shared_comm_queue = deque() | |||
self._setup_for_standalone() | |||
# in standalone mode, by default, we print the trainer info only once for better logs readability | |||
trainer_representative = self.client[1].trainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is client[1]
? any similar concept like "chief_worker" in ps-worker paradigm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the "[1]" just indicates the first client since we use at least one clients in all cases.
Arguably, we can modify the trainer_representative
access accordingly after we introducing a similar concept like "chief_worker" in future PR, which may be required by asynchronous simulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved. please have a look at my inline comments.
Co-authored-by: yuexiang.xyx <[email protected]> Update `tests/run.py` for Jenkins server (alibaba#4) just a workaround Feature/synchronize (alibaba#3) sync with the master branch of our original gitlab Feature/config refactor (alibaba#5) refactored configuration-related code modify README; minor fix (alibaba#6) Updated README fix gan cra loss_batch-> loss_task bug improved the environments set-up guidance improved the environments set-up guidance improved the environments set-up guidance Fix setup requirements. Update required python version to 3.9. updated auto-doc component according to the latest changes [Feature] Add dropout and log training metric. (alibaba#11) * Add dropout option for CNN and NLP model; Add training metric to logs. * allow users to determine whether to conduct evaluation on a specific split * Enable metric in global eval for users to determine whether to conduct evaluation on a specific split. * fix minor bug when importing nlp loss * Replace and remove `validate` with `evaluate(target_data_split_name=split)` to keep code clean. enabled the log file name valid in windows environment (alibaba#13) * enabled the log file name valid in windows environment update readme (alibaba#15) * update README added a demo for black-box optimization (alibaba#14) - added a demo for black-box optimization - enabled installation with cuda10 [Bugfix] fixed the invalid logger set-up if the logging is used before we call setup_logger (alibaba#17) * fixed the invalid logger set-up if the `logging` is used before we call `setup_logger` Change source of `download_url` from our own and fix `README` (alibaba#20) * Change source of `download_url` from our own `utils.py` and fix `README.md`. add logo (alibaba#26) - add logo - add more icons modify grpc_comm according to official tutorial (alibaba#25) fix path issue fix wrong logger usage reformatted Communication efficiency optimization (alibaba#19) * minor fixed for distributed mode * For the communication efficiency: dynamic type selection in gRPC servicer; transformer & parser Refactored the logger by reducing its redundancy and fixed some minor issues (alibaba#29) * Reducing the redundancy of the logger Update test_mf.py modify the unit test of mf task Refactor splitter&transform; Modify some data related config; Add external dataset. (alibaba#33) [Feature] FedEx (alibaba#37) [Feature] FedEx (alibaba#37) [Hotfix] print the missing ``Final`` results (alibaba#41) * hotfix for the missing ``Final`` results print Add pre-trained transformers as NLP model. TODO:@ZHEN, please fix online aggregator when the device is not specific. Add a example for transformers. Fix url. (alibaba#46) - added the local training baseline - enabled each client has its own early-stopper formatted by linter formatted by linter not use early_stopper in non-local mode bugfix for the cast "sample_client_num = -1" added global training mode via a proxy client that holds all data Fix un-consistent device for the PIA test added local fine-tuning before local evaluation linter format bugfix for fedex update README (alibaba#49) Feature/attack doc (alibaba#50) * improved the doc for attack module added API comments (alibaba#52) Fix docs about graph. (alibaba#51) Add api ref for mf task and context (alibaba#53) * add mf api reference and modify README.md typos fix Fix minor bugs Timeout strategy and minimal received number (alibaba#36) * For async: timeout strategy and minimal received number modify api reference (alibaba#56) update doc of core (alibaba#57) Add datasets from hugging face. Formatted and fix minor bugs. Add datasets and scripts for openml. Modify the example `yaml` of openml datasets. Add materials (paper lists, tutorials) (alibaba#60) * add FL paper list Add paper lists (alibaba#61) * add FL paper list fixed some missing API reference in fs.core (alibaba#54) As the title says. update release version (alibaba#64) Update graph paper list. (alibaba#65) Add paper list for FedHPO (alibaba#67) * added paper list for fedhpo rename and modify some val Add paper list for FedRec (alibaba#68) add paper list for FedRec added pfl paper list (alibaba#72) added pfl paper list hotfix for transformers to avoid import error updated pfl paper list (alibaba#73) updated pfl paper list fix url in dblp_new.py (alibaba#76) update README update debug squad model update update update
… issues (alibaba#29) * Reducing the redundancy of the logger
Reducing the redundancy of the logger according to #22 . Specifically, the changes are:
Besides, fix some bugs in unit tests and outdir