Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add federated contrastive learning baseline SimCLR and its linear prob evaluation #278

Merged
merged 0 commits into from
Aug 21, 2022

Conversation

xkxxfyf
Copy link
Contributor

@xkxxfyf xkxxfyf commented Aug 1, 2022

No description provided.

@CLAassistant
Copy link

CLAassistant commented Aug 1, 2022

CLA assistant check
All committers have signed the CLA.

@rayrayraykk rayrayraykk self-requested a review August 2, 2022 02:55
@rayrayraykk rayrayraykk added the Feature New feature label Aug 2, 2022


class SimCLRTransform():
def __init__(self, is_sup, image_size=32):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide python-style docstring for the newly added classes and functions

transform_train = SimCLRTransform(is_sup=False, image_size=32)
transform_test = T.Compose([
T.ToTensor(),
T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it conventional to use 0.5 rather than the sample mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting both parameters to 0.5 and using with T.totensor() can force the data to be scaled to the [-1,1] interval

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the first arg is mean of the signals, to my knowledge, it is usually calculated from the available examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T.totensor() can take the value of image from 0-255 to 0-1, and T.Normalize(0.5,0.5) can take 0-1 to -1-1 using function (x-mean)/std



class Bottleneck(nn.Module):
expansion = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to use class attribute?

representations = torch.cat([z1, z2], dim=0)
similarity_matrix = F.cosine_similarity(representations.unsqueeze(1), representations.unsqueeze(0), dim=-1)

l_pos = torch.diag(similarity_matrix, N)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, up to now, similarity_matrix is a 2N-by-2N matrix. Am I wrong? Why do we need to take the the above and below main diagonal?

# print(len(x), x[0].size(), x[1].size(), label.size())
x1, x2 = x[0], x[1]
z1, z2 = ctx.model(x1, x2)
if len(label.size()) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we enter such a branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we enter this branch in contrastive learning with two augment data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when does the length of the size of label become zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the torch_trainer and add this branch. Should I remove it?

@@ -10,7 +10,7 @@ def get_optimizer(model, type, lr, **kwargs):
if isinstance(type, str):
if hasattr(torch.optim, type):
if isinstance(model, torch.nn.Module):
return getattr(torch.optim, type)(model.parameters(), lr,
return getattr(torch.optim, type)(filter(lambda p: p.requires_grad, model.parameters()), lr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is any pfl algo affected by such a change? @yxdyc

Copy link
Collaborator

@joneswong joneswong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any code snippet copied from or somewhat inspired by other place, please provide a copyright for your files.

@joneswong joneswong self-assigned this Aug 2, 2022
@joneswong
Copy link
Collaborator

This pr includes a new trainer, which is designed for conducting contrastive learning. @DavdGao could you have a look at that part for us? Thanks!

Copy link
Collaborator

@joneswong joneswong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that no readily available splitter has been adopted in your exp, right? So how do you construct the non-iidness? We have planed to start with the LDA splitter. Please conduct the exp accordingly.

Copy link
Collaborator

@rayrayraykk rayrayraykk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the inline comments and follow the contributor rule to format your code.

federatedscope/cl/dataloader/Cifar10.py Outdated Show resolved Hide resolved
federatedscope/cl/dataloader/Cifar10.py Outdated Show resolved Hide resolved
federatedscope/cl/dataloader/Cifar10.py Outdated Show resolved Hide resolved
config = config
return data_dict, config

def Cifar4LP(config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated code from line 118 to line 156.

federatedscope/cl/dataloader/Cifar10.py Outdated Show resolved Hide resolved


# Model class
class ResNet(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR #267 , a ResNet model is already added in federatedscope/contrib/model/resnet.py. Please check if we still need to add a new resnet.

federatedscope/cl/test.ipynb Outdated Show resolved Hide resolved
federatedscope/core/auxiliaries/model_builder.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@joneswong joneswong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shell scripts for reproducing results of standalone and fedavg should be provided.

Copy link
Collaborator

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the inline comments and keep the code consistent with the master branch.

federatedscope/cl/dataloader/Cifar10.py Outdated Show resolved Hide resolved
federatedscope/cl/trainer/trainer.py Outdated Show resolved Hide resolved
federatedscope/cl/trainer/trainer.py Outdated Show resolved Hide resolved
federatedscope/cl/trainer/trainer.py Outdated Show resolved Hide resolved
federatedscope/cl/trainer/trainer.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new unit test for the new trainer and dataset

federatedscope/cl/trainer/trainer.py Outdated Show resolved Hide resolved
@@ -104,12 +104,37 @@ def _para_weighted_avg(self, models, recover_fun=None):
return avg_model


class NoCommunicationAggregator(Aggregator):
class NoCommunicationAggregator(ClientsAvgAggregator):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yxdyc Please have a look at this change to local mode, thanks.

# Split data into dict
data_dict = dict()

# Splitter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to splitter

and make it consistant.

@@ -0,0 +1,235 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like a copy from https://github.com/akhilmathurs/orchestra/blob/main/models.py.

Please consider the copyright issues.

@@ -0,0 +1,41 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,190 @@
import math
Copy link
Collaborator

@rayrayraykk rayrayraykk Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joneswong
Copy link
Collaborator

  1. please answer our questions about the code; 2. do not push the changes that obviously cannot pass the UnitTest.

T.RandomResizedCrop(32, scale=(0.5, 1.0), interpolation=T.InterpolationMode.BICUBIC),
T.RandomHorizontalFlip(p=0.5),
T.ToTensor(),
T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still very curious about why we have to use such a mean and std. If it is a conventional usage in CL, please explain for us. The ultimate image classification task does not use this transformation, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T.totensor() can take the value of image from 0-255 to 0-1, and T.Normalize(0.5,0.5) can take 0-1 to -1-1 using function (x-mean)/std. the data augement is time-costing and use sample mean and std wil take more time.

splitter = get_splitter(config)
data_train = splitter(data_train)
data_val = data_train
data_test = splitter(data_test)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the original train and test data of CIFAR10 are iid, how to ensure that splitting them by our splitter respectively can keep the train and test data of a specific client iid?

@xkxxfyf xkxxfyf merged commit 34dba76 into alibaba:master Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants