Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the logger by reducing its redundancy and fixed some minor issues #29

Merged
merged 21 commits into from
Apr 24, 2022

Conversation

yxdyc
Copy link
Collaborator

@yxdyc yxdyc commented Apr 20, 2022

Reducing the redundancy of the logger according to #22 . Specifically, the changes are:

  • save the raw results in another file and compress it
  • in standalone mode, by default, print the trainer meta-info once
  • in standalone mode, do not print the local eval results from client-side
  • fix the missing global eval results

Besides, fix some bugs in unit tests and outdir

yxdyc added 3 commits April 20, 2022 18:08
- save the raw results in another file and compress it
- in standalone mode, print the trainer meta-info once
- in standalone mode, do not print the local eval results from client-side
- fix the missing global eval results
- minor fix for the client id print and logger setup
…rove

# Conflicts:
#	federatedscope/core/auxiliaries/utils.py
@yxdyc yxdyc added bug Something isn't working enhancement New feature or request labels Apr 20, 2022
Copy link
Collaborator

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the detailed comments.

) # e.g., sub_exp_20220411030524
outdir = os.path.join(cfg.outdir, "sub_exp" +
datetime.now().strftime('_%Y%m%d%H%M%S')
) # e.g., sub_exp_20220411030524
while os.path.exists(cfg.outdir):
time.sleep(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need time.sleep(1) here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the case that in a very short time, some programs running in parallel create a directory with the same name

cfg.outdir,
"sub_exp" + datetime.now().strftime('_%Y%m%d%H%M%S'))
cfg.outdir = outdir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like cfg.outdir is initialized here by exp/sub_exp__%Y%m%d%H%M%S. Please make sure the config file is stored in the same dir (cfg is stored in the freeze function in config.py).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will call the logger_setup at the beginning and the following usages are based on the same outdir, see the new PR.

local_updated_models (list): each element is ooxx.
Returns:
b_local_dissimilarity (dict): the measurements.
'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please elaborate the metric of dissimilarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference added

@yxdyc yxdyc mentioned this pull request Apr 21, 2022
yxdyc added 5 commits April 21, 2022 11:26
…rove

# Conflicts:
#	federatedscope/config.py
#	federatedscope/core/auxiliaries/utils.py
#	federatedscope/core/fed_runner.py
#	federatedscope/core/worker/client.py
#	federatedscope/core/worker/server.py
#	federatedscope/gfl/fedsageplus/worker.py
#	federatedscope/gfl/gcflplus/worker.py
#	federatedscope/vertical_fl/worker/vertical_server.py
@@ -1,7 +1,7 @@
import os
import sys

DEV_MODE = False # simplify the federatedscope re-setup everytime we change the source codes of federatedscope
DEV_MODE = True # simplify the federatedscope re-setup everytime we change the source codes of federatedscope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better keep it False.

self.assertIsNotNone(data)

Fed_runner = FedRunner(data=data,
server_class=get_server_cls(global_cfg),
server_class=get_server_cls(init_cfg),
client_class=get_client_cls(global_cfg),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be init_cfg.

@@ -30,6 +30,10 @@ def __init__(self,
if self.mode == 'standalone':
self.shared_comm_queue = deque()
self._setup_for_standalone()
# in standalone mode, by default, we print the trainer info only once for better logs readability
trainer_representative = self.client[1].trainer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is client[1]? any similar concept like "chief_worker" in ps-worker paradigm?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the "[1]" just indicates the first client since we use at least one clients in all cases.

Arguably, we can modify the trainer_representative access accordingly after we introducing a similar concept like "chief_worker" in future PR, which may be required by asynchronous simulation.

Copy link
Collaborator

@joneswong joneswong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. please have a look at my inline comments.

@joneswong joneswong merged commit b64b048 into master Apr 24, 2022
@joneswong joneswong deleted the Feature/logging_improve branch April 24, 2022 03:11
cheneydon pushed a commit to cheneydon/FederatedScope that referenced this pull request Jun 14, 2022
Co-authored-by: yuexiang.xyx <[email protected]>

Update `tests/run.py` for Jenkins server (alibaba#4)

just a workaround

Feature/synchronize (alibaba#3)

sync with the master branch of our original gitlab

Feature/config refactor (alibaba#5)

refactored configuration-related code

modify README; minor fix (alibaba#6)

Updated README

fix gan cra loss_batch-> loss_task bug

improved the environments set-up guidance

improved the environments set-up guidance

improved the environments set-up guidance

Fix setup requirements.

Update required python version to 3.9.

updated auto-doc component according to the latest changes

[Feature] Add dropout and log training metric. (alibaba#11)

* Add dropout option for CNN and NLP model; Add training metric to logs.

* allow users to determine whether to conduct evaluation on a specific split

* Enable metric in global eval for users to determine whether to conduct evaluation on a specific split.

* fix minor bug when importing nlp loss

* Replace and remove `validate` with `evaluate(target_data_split_name=split)` to keep code clean.

enabled the log file name valid in windows environment (alibaba#13)

* enabled the log file name valid in windows environment

update readme (alibaba#15)

* update README

added a demo for black-box optimization (alibaba#14)

- added a demo for black-box optimization
- enabled installation with cuda10

[Bugfix] fixed the invalid logger set-up if the logging is used before we call setup_logger (alibaba#17)

* fixed the invalid logger set-up if the `logging` is used before we call `setup_logger`

Change source of `download_url` from our own and fix `README` (alibaba#20)

* Change source of `download_url` from our own `utils.py` and fix `README.md`.

add logo (alibaba#26)

- add logo
- add more icons

modify grpc_comm according to official tutorial (alibaba#25)

fix path issue

fix wrong logger usage

reformatted

Communication efficiency optimization (alibaba#19)

* minor fixed for distributed mode

* For the communication efficiency: dynamic type selection in gRPC servicer; transformer & parser

Refactored the logger by reducing its redundancy and fixed some minor issues (alibaba#29)

* Reducing the redundancy of the logger

Update test_mf.py

modify the unit test of mf task

Refactor splitter&transform; Modify some data related config; Add external dataset. (alibaba#33)

[Feature] FedEx (alibaba#37)

[Feature] FedEx (alibaba#37)

[Hotfix] print the missing ``Final`` results (alibaba#41)

* hotfix for the missing ``Final`` results print

Add pre-trained transformers as NLP model.

TODO:@ZHEN, please fix online aggregator when the device is not specific.

Add a example for transformers.

Fix url. (alibaba#46)

- added the local training baseline
- enabled each client has its own early-stopper

formatted by linter

formatted by linter

not use early_stopper in non-local mode

bugfix for the cast "sample_client_num = -1"

added global training mode via a proxy client that holds all data

Fix un-consistent device for the PIA test

added local fine-tuning before local evaluation

linter format

bugfix for fedex

update README (alibaba#49)

Feature/attack doc (alibaba#50)

* improved the doc for attack module

added API comments (alibaba#52)

Fix docs about graph. (alibaba#51)

Add api ref for mf task and context (alibaba#53)

* add mf api reference and modify README.md

typos fix

Fix minor bugs

Timeout strategy and minimal received number (alibaba#36)

* For async: timeout strategy and minimal received number

modify api reference (alibaba#56)

update doc of core (alibaba#57)

Add datasets from hugging face.

Formatted and fix minor bugs.

Add datasets and scripts for openml.

Modify the example `yaml` of openml datasets.

Add materials (paper lists, tutorials) (alibaba#60)

* add FL paper list

Add paper lists (alibaba#61)

* add FL paper list

fixed some missing API reference in fs.core (alibaba#54)

As the title says.

update release version (alibaba#64)

Update graph paper list. (alibaba#65)

Add paper list for FedHPO (alibaba#67)

* added paper list for fedhpo

rename and modify some val

Add paper list for FedRec (alibaba#68)

add paper list for FedRec

added pfl paper list (alibaba#72)

added pfl paper list

hotfix for transformers to avoid import error

updated pfl paper list (alibaba#73)

updated pfl paper list

fix url in dblp_new.py (alibaba#76)

update README

update

debug squad model

update

update

update
AnthonyXuan pushed a commit to AnthonyXuan/FederatedScope that referenced this pull request Aug 10, 2023
… issues (alibaba#29)

* Reducing the redundancy of the logger
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants