This is our Pytorch implementation for the paper:
Joo-yeong Song, Bongwon Suh (2022). Data Augmentation Strategies for Improving Sequential Recommender Systems. (arXiv preprint)
Please cite our paper if you use the code.
We evaluate our methods on 3 benchmark datasets: MovieLens-1M(ML-1M), Amazon Games, and Gowalla
For original ML-1M & Amazon dataset, please refer to here. (in data
folder)
For Gowalla dataset, you can find it here.
Also, we evaluate our methods by using a restricted fraction of the available training data for each dataset: {10%, 20%, 30%, 40%, 50%, 100% (full) }.
Each sub-dataset is generated by random sampling from the original full dataset. Thus, the other characteristics such as sparsity or average sequence length remain almost the same except for the size.
Several specific input parameters are required for training our model: dataset
, train_type
, aug_type
and aug_size
.
dataset
: ml-1m / Video / gowallatrain_type
: dime (10%) / cinq (20%) / tri (30%) / quad (40$) / half (50%) / all (100% size of original dataset)aug_type
: subset (subset split, default) / slide (sliding window) / noise (noise injection) / redund (redundancy injection) / pad (item masking) / subst (synonym replacement)aug_size
: integer (default: 10)
For example, to train our model on ML-1M dataset half size with noise injection strategy applied :
python main.py --dataset='ml-1m' --train_type='half' --aug_type='noise'
In this paper, we propose a set of data augmentation strategies for sequential recommendation. Note that we only focus on the data augmentation operated in the preprocessing step, with the other network architecture intact.
As a vanilla network architecture and baseline model, we use SASRec: Self-Attentive Sequential Recommendation (Kang and McAuley, 2018), the first model which applies self-attention mechanisms to sequential recommendation. That is to say, the overall architecture of the recommender system follows the original design of SASRec model.