-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The question about the testing dataset #5
Comments
Hello, thank you for reading our code carefully. Designating malicious datasets to serve as both training and test sets is not unfounded. In section 3.2 of the paper [SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification] (Semantic backdoor via model poisoning:) the authors mentioned that they used 3000 images of the number 7 in the validation set as backdoors. It is also described in Appendix B.5 Stealth of Attack of the paper. In their code (https://github.com/kiddyboots216/CommEfficient/blob/ca4d44098b4251d598fbd99edfe5c6f5e60fa6ad/CommEfficient/data_utils/fed_emnist.py) the malicious training and test sets are also specified in this way. This is mainly because backdooring tasks usually focus on the neural network's memory of some poisoned data. The code of the earliest work on federated learning backdoor attacks (https://github.com/ebagdasa/backdoor_federated_learning/blob/master/utils/params_runner.yaml) also has a similar approach, specifying that the malicious training samples are consistent with the malicious testing samples. However, we still encourage researchers to study both the same and different designated malicious test and training sets, as they reflect the model's pure memory vs. fine-tuning and generalization vs. fine-tuning, respectively. |
Thanks for your response.
[1] Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses |
To be honest, if the results on the FEMNIST in the paper are from this line of code, I think this is cheating. |
For you mentioned EMNIST task, in the base case the attacker's goal is to get the final model to misclassify certain datapoints, and their training dataset is the same as their test dataset. The trigger at this time is not a pixel pattern. For the base case mentioned in the paper, we process the image dataset in the same way as SparseFed. |
For the EMNIST task, as shown in [2], the authors separate the Ardis dataset into training and testing. Therefore, their training dataset is not the same as their test dataset. I think you should clarify this point in the paper. |
Yeah, thanks. For EMNIST task in the edge case, the training data is not the same as the test data, we will highlight this when we update our paper, we have updated the code to eliminate bugs caused by merging different versions of the code. |
For EMNITS dataset we also did the following additional experiment: in the experiment, the poisoned data is the pictures of the number 7 of the Ardis dataset (the dataset of edge cases), the target label set by the attacker is 1, AttackNum=200 (the attacker participates in 200 federated learning), and the server uses gradient norm clipping along with differential privacy defense strategies, the results are shown in this figure. In the figure, "train=test" means that the test set of the attacker is the same as the training set, and "train≠test" means that the test set and the training set are different. From the results, it can be found that: 1. Setting the test set to be the same or different from the training set can lead to the same conclusion as in our paper: Neurotoxin is better than the baseline. 2. There is no significant difference in backdoor accuracy between the two settings. |
Federated-Learning-Backdoor/FL_Backdoor_CV/image_helper.py
Line 271 in a7ef36a
I have a question about the testing dataset about backdoor attack accuracy on the FEMNIST.
From this line of code, the author seems to use the training data set directly as the test data set.
The text was updated successfully, but these errors were encountered: