Skip to content

Source codes for Data Augmentation Strategies for Improving Sequential Recommender Systems

Notifications You must be signed in to change notification settings

saladsong/DataAugForSeqRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Data Aug for Seq Rec.

This is our Pytorch implementation for the paper:

Joo-yeong Song, Bongwon Suh (2022). Data Augmentation Strategies for Improving Sequential Recommender Systems. (arXiv preprint)

Please cite our paper if you use the code.

Datasets.

We evaluate our methods on 3 benchmark datasets: MovieLens-1M(ML-1M), Amazon Games, and Gowalla

For original ML-1M & Amazon dataset, please refer to here. (in data folder) For Gowalla dataset, you can find it here.

Also, we evaluate our methods by using a restricted fraction of the available training data for each dataset: {10%, 20%, 30%, 40%, 50%, 100% (full) }.

Each sub-dataset is generated by random sampling from the original full dataset. Thus, the other characteristics such as sparsity or average sequence length remain almost the same except for the size.

Model Training.

Several specific input parameters are required for training our model: dataset, train_type, aug_type and aug_size.

  • dataset : ml-1m / Video / gowalla
  • train_type : dime (10%) / cinq (20%) / tri (30%) / quad (40$) / half (50%) / all (100% size of original dataset)
  • aug_type : subset (subset split, default) / slide (sliding window) / noise (noise injection) / redund (redundancy injection) / pad (item masking) / subst (synonym replacement)
  • aug_size : integer (default: 10)

For example, to train our model on ML-1M dataset half size with noise injection strategy applied :

python main.py --dataset='ml-1m' --train_type='half' --aug_type='noise'

Notes.

In this paper, we propose a set of data augmentation strategies for sequential recommendation. Note that we only focus on the data augmentation operated in the preprocessing step, with the other network architecture intact.

As a vanilla network architecture and baseline model, we use SASRec: Self-Attentive Sequential Recommendation (Kang and McAuley, 2018), the first model which applies self-attention mechanisms to sequential recommendation. That is to say, the overall architecture of the recommender system follows the original design of SASRec model.

About

Source codes for Data Augmentation Strategies for Improving Sequential Recommender Systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages