-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns regrading data sampling #3
Comments
And [1][2] seem to have the same issue, as they share very similar code for data sampling, which may also involve data leakage. |
Thank you for your interest in our work! I am glad to clarify this. In your experimental setting, users train the recommendation model independently using their personal training data. As the number of training iterations increases, the model observes an increasing number of negative data samples. This may lead to overfitting, where observed negative samples are inferred with low scores, and unseen test items end up being ranked relatively high and achieving abnormally good performance. This issue is more likely to occur when the total number of items is small (e.g., ml-100k), but it is less likely to arise when the dataset is larger and more representative of real-world recommendation scenarios (e.g., lastfm-2k). I would like to clarify that this overfitting phenomenon does not constitute data leakage. It can be alleviated by incorporating larger, more realistic recommendation datasets. Additionally, in your proposed setup, the test item will be treated as a negative sample during training, which can lead to unstable training because the same item has conflicting labels (positive in the test set, negative during training). I hope this explanation helps address your concerns! |
I observed the same issue and believe there is a significant data leakage, stemming from both the experimental setup and model design in the paper. In this study, the item is represented solely by its ID embedding, which results in test set items being present in the training set. This leakage leads to inflated model performance, with the model performing even better as more negative samples are leaked. This is evident in the 100% HR achieved on the ML-100K dataset when federated sharing is removed. From the model design perspective, in this paper the recommendation problem is essentially reduced to a binary classification task, where using only ID embeddings for items inevitably causes data leakage. |
I've evaluated the performance on both the test and validation sets, and based on this, I don't believe the model was overfitting. What I argue is that the test set should not be accessible during the training process; if it were, you would be effectively recommending it to the client. Consequently, in my opinion, conflicting labels can occur in RS. |
I've observed the same problem, is there any solution? |
I have encountered the same issue. How does the negative sampling strategy guarantee that the negative samples used in the test phase have not been included in the training phase?🧐 |
Hi, I really appreciate your work and have following issues:
I have observed that when I turned off federated learning (i.e., diasable downloading item embeddings from the server), specifically when I modified
engine.py
as follows:# user_param_dict['embedding_item.weight'] = copy.deepcopy(self.server_model_param['embedding_item.weight'].data).cuda() user_param_dict['embedding_item.weight'] = user_param_dict['embedding_item.weight'].cuda()
and then i got a wonderful performance (HR@10 = 1.0000, NDCG@10 = 0.9775), which is abnormally better than the proposed method.
So I tried to dig deeper into the issues and found that the negative items (used in the training process) were not being sampled in test set, code:
self.negatives = self._sample_negative(self.ratings)
def _sample_negative(self, ratings):
interact_status = ratings.groupby('userId')['itemId'].apply(set).reset_index().rename(columns={'itemId': 'interacted_items'})
interact_status['negative_items'] = interact_status['interacted_items'].apply(lambda x: self.item_pool - x)
interact_status['negative_samples'] = interact_status['negative_items'].apply(lambda x: random.sample(x, 198))
return interact_status[['userId', 'negative_items', 'negative_samples']]
where the
self.ratings
contains all ratings (include test, train, val) of clients, since client in training process does not have access the test (or validation) data, so this potentially lead to data leakage? The correct code should beself.negatives = self._sample_negative(self.train_ratings)
I'm not sure if I may have something missed, so I would appreciate any advice.
The text was updated successfully, but these errors were encountered: