Question on Dataset Usage Across Clients in Federated Learning Implementation #4

Minkyoon · 2024-03-14T06:59:32Z

Hello,

First off, I'd like to express my admiration for the work you've done on this federated learning project. It's truly insightful and has been a great resource for me as I delve into federated learning.

I have a question regarding a specific part of the code, particularly about the dataset usage across different clients. From my understanding, one of the fundamental principles of federated learning is that each client trains the model locally using its own distinct dataset. However, as I was reviewing the code, I noticed that it seems every client might be using the same train_dataset:

for i in range(250):  # model_client
    model_temp = LGA(args.numclass, feature_extractor, args.batch_size, args.task_size, args.memory_size,
                        args.epochs_local, args.learning_rate, train_dataset, args.device, encode_model,args)
    models.append(model_temp)

This observation leads me to wonder if the implementation deviates from federated learning's goal of having each client use a unique dataset. Could you please provide some insights into whether this implementation detail is intentional? Perhaps there's an aspect of the code or the federated learning approach I'm misunderstanding, and I'd be eager to learn more about your design choices in this regard.

Thank you so much for taking the time to address my question amidst your busy schedule. I look forward to your response and learning more about this fascinating project.

Best regards,

Minkyoon Yoo

WenqiLiang · 2024-03-15T14:06:27Z

Hi, thank you for your attention to our work. Following the regular FL setting, we set class distributions of local clients to be Non-iid, and each local client performs local training on its own dataset. You can check more implementation details in the paper. However, to implement this in our code, we first initialize the whole dataset for each local client as follows:

LGA/fl_main.py

Lines 96 to 99 in 867988c

    
           for i in range(250):  # model_client 
        
           	model_temp = LGA(args.numclass, feature_extractor, args.batch_size, args.task_size, args.memory_size, 
        
           							args.epochs_local, args.learning_rate, train_dataset, args.device, encode_model,args) 
        
           	models.append(model_temp)

Then, we redistribute data for each local client based on the categories of local client (e.g., only possess data of old task, possess data of both old and new task, only possess data of new task) as follows:

LGA/LGA.py

Lines 83 to 91 in 867988c

    
               if group != 0: 
        
                   if self.current_class is not None: 
        
                       self.last_class = self.current_class 
        
                   self.current_class = random.sample([x for x in range(self.numclass - self.task_size, self.numclass)], int(self.args.iid*self.task_size)) 
        
                   print(self.current_class) 
        
               else: 
        
                   self.last_class = None 
        
           self.train_loader = self._get_train_and_test_dataloader(self.current_class, False)

Hope this answer can help you.

Minkyoon · 2024-03-20T00:14:56Z

Thank you very much for your detailed explanation and clarification regarding the dataset usage in federated learning. Your response has greatly enhanced my understanding of the project's implementation and has addressed my queries effectively. I appreciate the time and effort you took to guide me through this aspect of your work.

Best regards

Minkyoon · 2024-04-08T06:28:47Z

Hello,

Thank you for your guidance and insights into this project. I've been closely examining the implementation details and have a question regarding the assignment of classes to clients, specifically the segment of code that involves random selection of classes for each client:

if group != 0: 
   if self.current_class is not None: 
       self.last_class = self.current_class 
   self.current_class = random.sample([x for x in range(self.numclass - my.task_size, self.numclass)], int(self.args.iid*self.task_size)) 
   print(self.current_class) 
else: 
   self.last_class = None

self.train_loader = self._get_train_and_test_dataloader(self.current_class, False)

Given this approach, there seems to be a possibility that different clients might be assigned the same class. My concern is about the implication of this scenario for federated learning, where the uniqueness of data across clients is paramount. Could this lead to situations where clients, by chance, end up using identical datasets, contrary to the decentralized and privacy-preserving principles of federated learning?

Could you please clarify how the implementation ensures diversity in dataset usage among clients, especially in light of the potential for overlapping class assignments?

Thank you for your time and assistance.

Best regards,

Minkyoon Yoo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on Dataset Usage Across Clients in Federated Learning Implementation #4

Question on Dataset Usage Across Clients in Federated Learning Implementation #4

Minkyoon commented Mar 14, 2024

WenqiLiang commented Mar 15, 2024

Minkyoon commented Mar 20, 2024

Minkyoon commented Apr 8, 2024

Question on Dataset Usage Across Clients in Federated Learning Implementation #4

Question on Dataset Usage Across Clients in Federated Learning Implementation #4

Comments

Minkyoon commented Mar 14, 2024

WenqiLiang commented Mar 15, 2024

Minkyoon commented Mar 20, 2024

Minkyoon commented Apr 8, 2024