Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns regrading data sampling #3

Open
Miopha opened this issue Nov 29, 2024 · 6 comments
Open

Concerns regrading data sampling #3

Miopha opened this issue Nov 29, 2024 · 6 comments

Comments

@Miopha
Copy link

Miopha commented Nov 29, 2024

Hi, I really appreciate your work and have following issues:
I have observed that when I turned off federated learning (i.e., diasable downloading item embeddings from the server), specifically when I modified engine.py as follows:

# user_param_dict['embedding_item.weight'] = copy.deepcopy(self.server_model_param['embedding_item.weight'].data).cuda() user_param_dict['embedding_item.weight'] = user_param_dict['embedding_item.weight'].cuda()

and then i got a wonderful performance (HR@10 = 1.0000, NDCG@10 = 0.9775), which is abnormally better than the proposed method.
So I tried to dig deeper into the issues and found that the negative items (used in the training process) were not being sampled in test set, code:

self.negatives = self._sample_negative(self.ratings)
def _sample_negative(self, ratings):
interact_status = ratings.groupby('userId')['itemId'].apply(set).reset_index().rename(columns={'itemId': 'interacted_items'})
interact_status['negative_items'] = interact_status['interacted_items'].apply(lambda x: self.item_pool - x)
interact_status['negative_samples'] = interact_status['negative_items'].apply(lambda x: random.sample(x, 198))
return interact_status[['userId', 'negative_items', 'negative_samples']]

where the self.ratings contains all ratings (include test, train, val) of clients, since client in training process does not have access the test (or validation) data, so this potentially lead to data leakage? The correct code should be

self.negatives = self._sample_negative(self.train_ratings)

I'm not sure if I may have something missed, so I would appreciate any advice.

@Miopha
Copy link
Author

Miopha commented Nov 29, 2024

And [1][2] seem to have the same issue, as they share very similar code for data sampling, which may also involve data leakage.
[1] GPFedRec: Graph-Guided Personalization for Federated Recommendation. KDD 2024
[2] Federated recommendation with additive personalization. ICLR 2024

@Zhangcx19
Copy link
Owner

Thank you for your interest in our work! I am glad to clarify this.

In your experimental setting, users train the recommendation model independently using their personal training data. As the number of training iterations increases, the model observes an increasing number of negative data samples. This may lead to overfitting, where observed negative samples are inferred with low scores, and unseen test items end up being ranked relatively high and achieving abnormally good performance. This issue is more likely to occur when the total number of items is small (e.g., ml-100k), but it is less likely to arise when the dataset is larger and more representative of real-world recommendation scenarios (e.g., lastfm-2k).

I would like to clarify that this overfitting phenomenon does not constitute data leakage. It can be alleviated by incorporating larger, more realistic recommendation datasets.

Additionally, in your proposed setup, the test item will be treated as a negative sample during training, which can lead to unstable training because the same item has conflicting labels (positive in the test set, negative during training).

I hope this explanation helps address your concerns!

@1180301026
Copy link

I observed the same issue and believe there is a significant data leakage, stemming from both the experimental setup and model design in the paper. In this study, the item is represented solely by its ID embedding, which results in test set items being present in the training set. This leakage leads to inflated model performance, with the model performing even better as more negative samples are leaked. This is evident in the 100% HR achieved on the ML-100K dataset when federated sharing is removed. From the model design perspective, in this paper the recommendation problem is essentially reduced to a binary classification task, where using only ID embeddings for items inevitably causes data leakage.

@Miopha
Copy link
Author

Miopha commented Dec 4, 2024

I've evaluated the performance on both the test and validation sets, and based on this, I don't believe the model was overfitting. What I argue is that the test set should not be accessible during the training process; if it were, you would be effectively recommending it to the client. Consequently, in my opinion, conflicting labels can occur in RS.

@gsy901
Copy link

gsy901 commented Dec 16, 2024

I've observed the same problem, is there any solution?

@boring-orange
Copy link

I have encountered the same issue. How does the negative sampling strategy guarantee that the negative samples used in the test phase have not been included in the training phase?🧐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants