Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more models and datasets from external packages. #42

Merged
merged 12 commits into from
May 7, 2022

Conversation

rayrayraykk
Copy link
Collaborator

@rayrayraykk rayrayraykk commented Apr 27, 2022

  1. Add pre-trained transformers to NLP models.
  2. Add datasets from hugging face.
  3. add datasets from openml.

@rayrayraykk rayrayraykk requested review from joneswong and yxdyc April 27, 2022 12:04
@rayrayraykk rayrayraykk added the Feature New feature label Apr 27, 2022
@rayrayraykk rayrayraykk changed the title Add pre-trained transformers to NLP model. Add more models and datasets from external packages. May 6, 2022
joneswong
joneswong previously approved these changes May 6, 2022
Copy link
Collaborator

@joneswong joneswong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please help us review the usage of Transformer and the datasets (e.g., SST-2)? @yxdyc Thanks!

Copy link
Collaborator

@yxdyc yxdyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, plz see the inline comments.

@@ -42,11 +42,13 @@ RUN conda install -y pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoo
# for graph
RUN conda install -y pyg==2.0.1 -c pyg \
&& conda install -y rdkit=2021.09.4 -c conda-forge \
&& conda install -y nltk \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nltk should be put at the back of the line (the NLP part below)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLTK is used for generating features of some graph datasets.

local_update_steps: 1
total_round_num: 400
batch_or_epoch: 'epoch'
client_num: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this baseline uses only 1 client?

DATA_LOAD_FUNCS = {
'torchvision': load_torchvision_data,
'torchtext': load_torchtext_data,
'torchaudio': load_torchaudio_data,
'torch_geometric': load_torch_geometric_data
'torch_geometric': load_torch_geometric_data,
'datasets': load_datasets_data,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name "datasets" is too general, how about huggingface_datasets

federate:
mode: standalone
local_update_steps: 1
total_round_num: 400
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In centralized mode, the fine-tuning often takes only a few epochs. Maybe we can set the total_round_num to be in the order of dozens, e.g., 40?

@yxdyc yxdyc merged commit 9828965 into alibaba:master May 7, 2022
@rayrayraykk rayrayraykk deleted the transformers branch May 13, 2022 07:04
AnthonyXuan pushed a commit to AnthonyXuan/FederatedScope that referenced this pull request Aug 10, 2023
Add more models and datasets from external packages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants