diff --git a/asset/time_test_result/Context-aware_recommendation.md b/asset/time_test_result/Context-aware_recommendation.md index 5c1167e3a..979e14d01 100644 --- a/asset/time_test_result/Context-aware_recommendation.md +++ b/asset/time_test_result/Context-aware_recommendation.md @@ -2,21 +2,21 @@ ### Datasets information: -| Dataset | #Interaction | #Feature Field | #Feature | -| ------- | ------------: | --------------: | --------: | -| ml-1m | 1,000,209 | 5 | 134 | -| Criteo | 2,292,530 | 39 | 2,572,192 | -| Avazu | 4,218,938 | 21 | 1,326,631 | +| Dataset | #Interaction | #Feature Field | #Feature | +| ------- | -----------: | -------------: | ---------: | +| ml-1m | 1,000,209 | 5 | 134 | +| Criteo | 1,000,000 | 39 | 2,572,192 | +| Avazu | 4,218,938 | 21 | 1,326,631 | ### Device information ``` OS: Linux -Python Version: 3.8.3 -PyTorch Version: 1.7.0 +Python Version: 3.8.10 +PyTorch Version: 1.8.1 cudatoolkit Version: 10.1 -GPU: TITAN RTX(24GB) -Machine Specs: 32 CPU machine, 64GB RAM +GPU: TITAN V(12GB) +Machine Specs: 14 CPU machine, 256GB RAM ``` ### 1) ml-1m dataset: @@ -24,22 +24,22 @@ Machine Specs: 32 CPU machine, 64GB RAM #### Time and memory cost on ml-1m dataset: | Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -----------------: | -----------------: | -----------: | -| LR | 18.34 | 2.18 | 0.82 | -| DIN | 20.37 | 2.26 | 1.16 | -| DSSM | 21.93 | 2.24 | 0.95 | -| FM | 19.33 | 2.34 | 0.83 | -| DeepFM | 20.42 | 2.27 | 0.91 | -| Wide&Deep | 26.13 | 2.95 | 0.89 | -| NFM | 23.36 | 2.26 | 0.89 | -| AFM | 20.08 | 2.26 | 0.92 | -| AutoInt | 22.41 | 2.34 | 0.94 | -| DCN | 28.33 | 2.97 | 0.93 | -| FNN(DNN) | 19.51 | 2.21 | 0.91 | -| PNN | 22.29 | 2.23 | 0.91 | -| FFM | 22.98 | 2.47 | 0.87 | -| FwFM | 23.38 | 2.50 | 0.85 | -| xDeepFM | 24.40 | 2.30 | 1.06 | +| --------- | ------------------------: | --------------------------: | --------------: | +| LR | 1.02 | 1.38 | 0.03 | +| DIN | 24.26 | 0.87 | 4.61 | +| DSSM | 5.69 | 1.17 | 0.19 | +| FM | 1.08 | 1.34 | 0.03 | +| DeepFM | 2.08 | 1.50 | 0.06 | +| Wide&Deep | 2.12 | 1.25 | 0.03 | +| NFM | 3.79 | 1.12 | 0.05 | +| AFM | 1.77 | 1.36 | 0.15 | +| AutoInt | 3.84 | 1.44 | 0.17 | +| DCN | 4.98 | 1.12 | 0.16 | +| FNN(DNN) | 1.95 | 1.32 | 0.10 | +| PNN | 2.45 | 1.50 | 0.13 | +| FFM | 2.39 | 1.17 | 0.13 | +| FwFM | 2.25 | 1.22 | 0.10 | +| xDeepFM | 7.20 | 1.17 | 0.87 | #### Config file of ml-1m dataset: @@ -49,24 +49,30 @@ field_separator: "\t" seq_separator: " " USER_ID_FIELD: user_id ITEM_ID_FIELD: item_id +RATING_FIELD: rating +NEG_PREFIX: neg_ LABEL_FIELD: label threshold: - rating: 4.0 -unused_col: - inter: [rating] + rating: 4 load_col: - inter: [user_id, item_id, rating] - item: [item_id, release_year, genre] - user: [user_id, age, gender, occupation] + inter: [user_id, item_id, rating] + user: [user_id, age, gender, occupation] + item: [item_id, genre] # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 -eval_setting: RO_RS -group_by_user: False +train_batch_size: 4096 +eval_batch_size: 25600 +eval_args: + split: {'RS':[0.8, 0.1, 0.1]} + group_by: ~ + mode: labeled + order: RO valid_metric: AUC metrics: ['AUC', 'LogLoss'] + +# model +embedding_size: 10 ``` Other parameters (including model parameters) are default value. @@ -76,22 +82,22 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on Criteo dataset: | Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -------------------------: | ---------------------------: | ---------------: | -| LR | 7.65 | 0.61 | 1.11 | -| DIN | - | - | - | -| DSSM | - | - | - | -| FM | 9.77 | 0.73 | 1.45 | -| DeepFM | 13.64 | 0.83 | 1.72 | -| Wide&Deep | 13.58 | 0.80 | 1.72 | -| NFM | 13.36 | 0.75 | 1.72 | -| AFM | 19.40 | 1.02 | 2.34 | -| AutoInt | 19.40 | 0.98 | 2.06 | -| DCN | 16.25 | 0.78 | 1.67 | -| FNN(DNN) | 10.03 | 0.64 | 1.63 | -| PNN | 12.92 | 0.72 | 1.85 | -| FFM | - | - | Out of Memory | -| FwFM | 1175.24 | 8.90 | 2.12 | -| xDeepFM | 32.27 | 1.34 | 2.25 | +| --------- | ------------------------: | --------------------------: | --------------: | +| LR | 1.16 | 0.10 | 0.10 | +| DIN | - | - | - | +| DSSM | - | - | - | +| FM | 1.67 | 0.13 | 0.34 | +| DeepFM | 3.55 | 0.13 | 0.34 | +| Wide&Deep | 3.41 | 0.13 | 0.34 | +| NFM | 3.58 | 0.14 | 0.35 | +| AFM | 5.69 | 0.27 | 2.13 | +| AutoInt | 5.42 | 0.22 | 1.14 | +| DCN | 4.20 | 0.15 | 0.42 | +| FNN(DNN) | 2.16 | 0.11 | 0.36 | +| PNN | 3.32 | 0.14 | 0.77 | +| FFM | 57.66 | 0.71 | 8.60 | +| FwFM | 482.04 | 3.21 | 1.59 | +| xDeepFM | 10.55 | 0.34 | 1.91 | Note: Criteo dataset is not suitable for DIN model and DSSM model. #### Config file of Criteo dataset: @@ -107,26 +113,21 @@ LABEL_FIELD: label load_col: inter: '*' -highest_val: - index: 2292530 - fill_nan: True normalize_all: True -min_item_inter_num: 0 -min_user_inter_num: 0 - -unused_col: - inter: [index] - # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 -eval_setting: RO_RS -group_by_user: False +train_batch_size: 4096 +eval_batch_size: 4096 +eval_args: + mode: labeled + group_by: ~ valid_metric: AUC metrics: ['AUC', 'LogLoss'] + +# model +embedding_size: 10 ``` Other parameters (including model parameters) are default value. @@ -135,23 +136,23 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on Avazu dataset: -| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -------------------------: | ---------------------------: | ---------------: | -| LR | 9.30 | 0.76 | 1.42 | -| DIN | - | - | - | -| DSSM | - | - | - | -| FM | 25.68 | 0.94 | 2.60 | -| DeepFM | 28.41 | 1.19 | 2.66 | -| Wide&Deep | 27.58 | 0.97 | 2.66 | -| NFM | 30.46 | 1.06 | 2.66 | -| AFM | 31.03 | 1.06 | 2.69 | -| AutoInt | 38.11 | 1.41 | 2.84 | -| DCN | 30.78 | 0.96 | 2.64 | -| FNN(DNN) | 23.53 | 0.84 | 2.60 | -| PNN | 25.86 | 0.90 | 2.68 | -| FFM | - | - | Out of Memory | -| FwFM | 336.75 | 7.49 | 2.63 | -| xDeepFM | 54.88 | 1.45 | 2.89 | +| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | +| --------- | ------------------------: | --------------------------: | -------------------: | +| LR | 4.01 | 0.37 | 0.36 | +| DIN | - | - | - | +| DSSM | - | - | - | +| FM | 13.14 | 0.40 | 1.35 | +| DeepFM | 14.69 | 0.48 | 1.38 | +| Wide&Deep | 14.20 | 0.41 | 1.36 | +| NFM | 17.40 | 0.48 | 1.36 | +| AFM | 18.25 | 0.55 | 1.89 | +| AutoInt | 21.42 | 0.68 | 1.67 | +| DCN | 18.95 | 0.44 | 1.37 | +| FNN(DNN) | 12.13 | 0.40 | 1.31 | +| PNN | 14.19 | 0.41 | 1.45 | +| FFM | - | - | CUDA out of memory | +| FwFM | 292.43 | 3.83 | 1.74 | +| xDeepFM | 35.60 | 0.93 | 2.20 | Note: Avazu dataset is not suitable for DIN model and DSSM model. #### Config file of Avazu dataset: @@ -160,28 +161,37 @@ Note: Avazu dataset is not suitable for DIN model and DSSM model. # dataset config field_separator: "\t" seq_separator: " " -USER_ID_FIELD: ~ -ITEM_ID_FIELD: ~ LABEL_FIELD: label -fill_nan: True -normalize_all: True -load_col: +load_col: inter: '*' - -lowest_val: - timestamp: 14102931 -unused_col: - inter: [timestamp] + +fill_nan: True +USER_ID_FIELD: ~ +ITEM_ID_FIELD: ~ +normalize_all: True +val_interval: + timestamp: "[14102931, inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 -eval_setting: RO_RS +train_batch_size: 4096 +eval_batch_size: 4096 +eval_args: + group_by: ~ + split: {'RS':[0.8, 0.1, 0.1]} + mode: labeled + order: RO group_by_user: False valid_metric: AUC metrics: ['AUC', 'LogLoss'] + +# model +embedding_size: 10 +attention_size: 30 +dropout_prob: 0.1 +learning_rate: 5e-5 +reg_weight: 5 ``` Other parameters (including model parameters) are default value. diff --git a/asset/time_test_result/General_recommendation.md b/asset/time_test_result/General_recommendation.md index 851df2a2a..e3c7a7740 100644 --- a/asset/time_test_result/General_recommendation.md +++ b/asset/time_test_result/General_recommendation.md @@ -2,21 +2,21 @@ ### Datasets information: -| Dataset | #User | #Item | #Interaction | Sparsity | -| ------- | -------: | ------: | ------------: | --------: | -| ml-1m | 6,041 | 3,707 | 1,000,209 | 0.9553 | -| Netflix | 80,476 | 16,821 | 1,977,844 | 0.9985 | -| Yelp | 102,046 | 98,408 | 2,903,648 | 0.9997 | +| Dataset | #User | #Item | #Interaction | Sparsity | +| ------- | ------: | ------: | -----------: | -------: | +| ml-1m | 6,040 | 3,629 | 836,478 | 0.9618 | +| Netflix | 40,227 | 8,727 | 1,752,648 | 0.9950 | +| Yelp | 45,478 | 30,709 | 1,777,765 | 0.9987 | ### Device information ``` OS: Linux -Python Version: 3.8.3 -PyTorch Version: 1.7.0 +Python Version: 3.8.10 +PyTorch Version: 1.8.1 cudatoolkit Version: 10.1 -GPU: TITAN RTX(24GB) -Machine Specs: 32 CPU machine, 64GB RAM +GPU: TITAN V(12GB) +Machine Specs: 14 CPU machine, 256GB RAM ``` ### 1) ml-1m dataset: @@ -25,19 +25,19 @@ Machine Specs: 32 CPU machine, 64GB RAM | Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | | ---------- | ------------------------: | --------------------------: | --------------: | -| Popularity | 2.11 | 8.08 | 0.82 | -| ItemKNN | 2.00 | 11.76 | 0.82 | -| BPRMF | 1.93 | 7.43 | 0.91 | -| NeuMF | 4.94 | 13.12 | 0.94 | -| DMF | 4.47 | 12.63 | 1.52 | -| NAIS | 59.27 | 24.41 | 21.83 | -| NGCF | 12.09 | 7.12 | 1.20 | -| GCMC | 9.04 | 54.15 | 1.32 | -| LightGCN | 7.83 | 7.47 | 1.15 | -| DGCF | 181.66 | 8.06 | 6.59 | -| ConvNCF | 8.46 | 19.60 | 1.31 | -| FISM | 19.30 | 10.92 | 6.94 | -| SpectralCF | 13.87 | 6.97 | 1.19 | +| Popularity | 0.62 | 0.41 | 0.00 | +| ItemKNN | 0.65 | 4.87 | 0.00 | +| BPRMF | 0.89 | 0.71 | 0.03 | +| NeuMF | 3.63 | 0.83 | 0.33 | +| DMF | 3.70 | 1.34 | 0.87 | +| NAIS | 44.94 | 13.73 | 8.12 | +| NGCF | 6.19 | 0.40 | 0.19 | +| GCMC | 4.46 | 1.74 | 0.26 | +| LightGCN | 3.76 | 0.76 | 0.16 | +| DGCF | 63.83 | 0.57 | 4.15 | +| ConvNCF | 8.43 | 10.04 | 8.58 | +| FISM | 17.54 | 3.46 | 3.35 | +| SpectralCF | 8.02 | 0.43 | 0.18 | #### Config file of ml-1m dataset: @@ -48,20 +48,23 @@ seq_separator: " " USER_ID_FIELD: user_id ITEM_ID_FIELD: item_id RATING_FIELD: rating -TIME_FIELD: timestamp -LABEL_FIELD: label NEG_PREFIX: neg_ +LABEL_FIELD: label load_col: - inter: [user_id, item_id, rating, timestamp] -min_user_inter_num: 0 -min_item_inter_num: 0 - + inter: [user_id, item_id, rating] +val_interval: + rating: "[3,inf)" +unused_col: + inter: [rating] # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. @@ -70,21 +73,21 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on Netflix dataset: -| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| ---------- | ----------------: | -----------------: | -----------: | -| Popularity | 3.98 | 58.86 | 0.86 | -| ItemKNN | 5.42 | 69.64 | 0.86 | -| BPRMF | 4.42 | 52.81 | 1.08 | -| NeuMF | 11.33 | 238.92 | 1.26 | -| DMF | 20.62 | 68.89 | 7.12 | -| NAIS | - | - | Out of Memory | -| NGCF | 52.50 | 51.60 | 2.00 | -| GCMC | 93.15 | 1810.43 | 3.17 | -| LightGCN | 30.21 | 47.12 | 1.58 | -| DGCF | 750.74 | 47.23 | 12.52 | -| ConvNCF | 17.02 | 402.65 | 1.44 | -| FISM | 86.52 | 83.26 | 20.54 | -| SpectralCF | 59.92 | 46.94 | 1.88 | +| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | +| ---------- | ------------------------: | --------------------------: | -------------------: | +| Popularity | 1.55 | 6.62 | 0.00 | +| ItemKNN | 2.48 | 25.71 | 0.00 | +| BPRMF | 1.92 | 5.82 | 0.09 | +| NeuMF | 7.54 | 12.61 | 0.40 | +| DMF | 10.66 | 8.14 | 3.54 | +| NAIS | - | - | CUDA out of memory | +| NGCF | 18.26 | 5.70 | 0.58 | +| GCMC | 22.07 | 86.32 | 1.17 | +| LightGCN | 10.85 | 6.31 | 0.41 | +| DGCF | 269.08 | 5.39 | 8.80 | +| ConvNCF | 15.66 | 168.54 | 8.29 | +| FISM | 57.58 | 23.10 | 8.51 | +| SpectralCF | 20.67 | 5.52 | 0.51 | #### Config file of Netflix dataset: @@ -100,19 +103,22 @@ LABEL_FIELD: label NEG_PREFIX: neg_ load_col: inter: [user_id, item_id, rating, timestamp] -min_user_inter_num: 3 -min_item_inter_num: 0 -lowest_val: - timestamp: 1133366400 - rating: 3 -unused_col: - inter: [timestamp, rating] + +user_inter_num_interval: "[10,inf)" +item_inter_num_interval: "[10,inf)" +val_interval: + rating: "[3,inf)" + timestamp: "[1133366400, inf)" + # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. @@ -121,21 +127,21 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on Yelp dataset: -| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | -| ---------- | -------------------------: | -------------------------: | ---------------: | -| Popularity | 5.69 | 134.23 | 0.89 | -| ItemKNN | 8.44 | 194.24 | 0.90 | -| BPRMF | 6.31 | 120.03 | 1.29 | -| NeuMF | 17.38 | 2069.53 | 1.67 | -| DMF | 43.96 | 173.13 | 9.22 | -| NAIS | - | - | Out of Memory | -| NGCF | 122.90 | 129.59 | 3.28 | -| GCMC | 299.36 | 9833.24 | 5.96 | -| LightGCN | 67.91 | 116.16 | 2.02 | -| DGCF | 1542.00 | 119.00 | 17.17 | -| ConvNCF | 87.56 | 11155.31 | 1.62 | -| FISM | - | - | Out of Memory | -| SpectralCF | 138.99 | 133.37 | 3.10 | +| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | +| ---------- | ------------------------: | ------------------------: | -------------------: | +| Popularity | 1.71 | 6.45 | 0.02 | +| ItemKNN | 5.67 | 37.37 | 0.02 | +| BPRMF | 2.86 | 5.96 | 0.13 | +| NeuMF | 7.75 | 32.75 | 1.27 | +| DMF | 12.82 | 9.27 | 2.90 | +| NAIS | - | - | CUDA out of memory | +| NGCF | 23.17 | 5.62 | 0.79 | +| GCMC | 32.20 | 110.34 | 1.65 | +| LightGCN | 13.06 | 5.85 | 0.47 | +| DGCF | 270.31 | 5.92 | 8.62 | +| ConvNCF | - | - | CUDA out of memory | +| FISM | - | - | CUDA out of memory | +| SpectralCF | 24.44 | 5.73 | 0.62 | #### Config file of Yelp dataset: @@ -151,18 +157,21 @@ LABEL_FIELD: label NEG_PREFIX: neg_ load_col: inter: [user_id, business_id, stars] -min_user_inter_num: 10 -min_item_inter_num: 4 -lowest_val: - stars: 3 -unused_col: - inter: [stars] + +user_inter_num_interval: "[15,inf)" +item_inter_num_interval: "[15,inf)" +val_interval: + stars: "[3,inf)" + # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. diff --git a/asset/time_test_result/Knowledge-based_recommendation.md b/asset/time_test_result/Knowledge-based_recommendation.md index 53b6dca69..3f123b4b4 100644 --- a/asset/time_test_result/Knowledge-based_recommendation.md +++ b/asset/time_test_result/Knowledge-based_recommendation.md @@ -2,21 +2,21 @@ ### Datasets information: -| Dataset | #User | #Item | #Interaction | Sparsity | #Entity | #Relation | #Triple | -| ------- | ------: | -------: | ------------: | --------: | ---------: | ---------: | ---------: | -| ml-1m | 6,040 | 3,629 | 836,478 | 0.9618 | 79,388 | 51 | 385,923 | -| ml-10m | 69,864 | 10,599 | 8,242,124 | 0.9889 | 181,941 | 51 | 1,051,385 | -| LFM-1b | 64,536 | 156,343 | 6,544,312 | 0.9994 | 1,751,586 | 10 | 3,054,516 | +| Dataset | #User | #Item | #Interaction | Sparsity | #Entity | #Relation | #Triple | +| ---------- | ------: | ------: | -----------: | -------: | -------: | --------: | ---------: | +| ml-1m | 6,040 | 3,629 | 836,478 | 0.9618 | 79,388 | 51 | 385,923 | +| ml-10m | 69,864 | 10,599 | 8,242,124 | 0.9889 | 181,941 | 51 | 1,051,385 | +| LFM-1b2013 | 28,150 | 64,583 | 1,907,900 | 0.9990 | 181,112 | 7 | 281,900 | ### Device information ``` OS: Linux -Python Version: 3.8.3 -PyTorch Version: 1.7.0 +Python Version: 3.8.10 +PyTorch Version: 1.8.1 cudatoolkit Version: 10.1 -GPU: TITAN RTX(24GB) -Machine Specs: 32 CPU machine, 64GB RAM +GPU: TITAN V(12GB) +Machine Specs: 14 CPU machine, 256GB RAM ``` ### 1) ml-1m dataset: @@ -24,15 +24,15 @@ Machine Specs: 32 CPU machine, 64GB RAM #### Time and memory cost on ml-1m dataset: | Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -------------------------: | ---------------------------: | ---------------: | -| CKE | 3.76 | 8.73 | 1.16 | -| KTUP | 3.82 | 17.68 | 1.04 | -| RippleNet | 9.39 | 13.13 | 4.57 | -| KGAT | 9.59 | 8.63 | 3.52 | -| KGNN-LS | 4.78 | 15.09 | 1.04 | -| KGCN | 2.25 | 13.71 | 1.04 | -| MKR | 6.25 | 14.89 | 1.29 | -| CFKG | 1.49 | 9.76 | 0.97 | +| --------- | ------------------------: | --------------------------: | --------------: | +| CKE | 4.88 | 0.44 | 0.38 | +| KTUP | 3.76 | 1.70 | 0.47 | +| RippleNet | 35.85 | 0.84 | 7.26 | +| KGAT | 6.68 | 0.37 | 2.10 | +| KGNN-LS | 8.20 | 1.14 | 0.57 | +| KGCN | 3.56 | 1.14 | 0.56 | +| MKR | 4.36 | 5.57 | 3.68 | +| CFKG | 1.60 | 0.57 | 0.27 | #### Config file of ml-1m dataset: @@ -53,16 +53,17 @@ load_col: inter: [user_id, item_id, rating] kg: [head_id, relation_id, tail_id] link: [item_id, entity_id] -lowest_val: - rating: 3 -unused_col: - inter: [rating] +val_interval: + rating: "[3,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. @@ -71,16 +72,16 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on ml-10m dataset: -| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -------------------------: | ---------------------------: | ---------------: | -| CKE | 8.65 | 85.53 | 1.46 | -| KTUP | 40.71 | 507.56 | 1.43 | -| RippleNet | 32.01 | 152.40 | 4.71 | -| KGAT | 298.22 | 80.94 | 22.44 | -| KGNN-LS | 15.47 | 241.57 | 1.42 | -| KGCN | 7.73 | 244.93 | 1.42 | -| MKR | 61.05 | 383.29 | 1.80 | -| CFKG | 5.99 | 140.74 | 1.35 | +| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | +| --------- | ------------------------: | --------------------------: | -----------------: | +| CKE | 43.41 | 10.72 | 0.70 | +| KTUP | 33.82 | 38.87 | 0.66 | +| RippleNet | 360.16 | 23.35 | 7.38 | +| KGAT | - | - | CUDA out of memory | +| KGNN-LS | 84.51 | 47.31 | 0.73 | +| KGCN | 20.13 | 53.33 | 0.74 | +| MKR | 31.74 | 207.12 | 3.85 | +| CFKG | 16.33 | 16.88 | 0.46 | #### Config file of ml-10m dataset: @@ -101,16 +102,17 @@ load_col: inter: [user_id, item_id, rating] kg: [head_id, relation_id, tail_id] link: [item_id, entity_id] -lowest_val: - rating: 3 -unused_col: - inter: [rating] +val_interval: + rating: "[3,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. @@ -120,15 +122,15 @@ Other parameters (including model parameters) are default value. #### Time and memory cost on LFM-1b dataset: | Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| --------- | -------------------------: | ---------------------------: | ---------------: | -| CKE | 62.99 | 82.93 | 4.45 | -| KTUP | 91.79 | 3218.69 | 4.36 | -| RippleNet | 126.26 | 188.38 | 6.49 | -| KGAT | 626.07 | 75.70 | 23.28 | -| KGNN-LS | 62.55 | 1709.10 | 4.73 | -| KGCN | 52.54 | 1763.03 | 4.71 | -| MKR | 290.01 | 2341.91 | 6.96 | -| CFKG | 53.35 | 553.58 | 4.22 | +| --------- | ------------------------: | --------------------------: | --------------: | +| CKE | 8.73 | 41.84 | 0.69 | +| KTUP | 4.56 | 87.67 | 0.48 | +| RippleNet | 82.53 | 69.37 | 7.32 | +| KGAT | 15.69 | 40.75 | 4.19 | +| KGNN-LS | 15.65 | 436.84 | 0.61 | +| KGCN | 8.04 | 443.80 | 0.60 | +| MKR | 9.06 | 456.11 | 2.87 | +| CFKG | 3.81 | 50.48 | 0.45 | #### Config file of LFM-1b dataset: @@ -137,7 +139,7 @@ Other parameters (including model parameters) are default value. field_separator: "\t" seq_separator: " " USER_ID_FIELD: user_id -ITEM_ID_FIELD: tracks_id +ITEM_ID_FIELD: item_id RATING_FIELD: rating HEAD_ENTITY_ID_FIELD: head_id TAIL_ENTITY_ID_FIELD: tail_id @@ -146,24 +148,22 @@ ENTITY_ID_FIELD: entity_id NEG_PREFIX: neg_ LABEL_FIELD: label load_col: - inter: [user_id, tracks_id, timestamp] + inter: [user_id, item_id, rating] kg: [head_id, relation_id, tail_id] - link: [tracks_id, entity_id] -lowest_val: - timestamp: 1356969600 - -highest_val: - timestamp: 1362067200 -unused_col: - inter: [timestamp] -min_user_inter_num: 2 -min_item_inter_num: 15 + link: [item_id, entity_id] +val_interval: + rating: "[10,inf)" +user_inter_num_interval: "[10,inf)" +item_inter_num_interval: "[10,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 102400 valid_metric: MRR@10 + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. diff --git a/asset/time_test_result/Sequential_recommendation.md b/asset/time_test_result/Sequential_recommendation.md index 5fa984797..289a5e13c 100644 --- a/asset/time_test_result/Sequential_recommendation.md +++ b/asset/time_test_result/Sequential_recommendation.md @@ -2,46 +2,46 @@ ### Datasets information: -| Dataset | #User | #Item | #Interaction | Sparsity | -| ---------- | -------: | ------: | ------------: | --------: | -| ml-1m | 6,041 | 3,707 | 1,000,209 | 0.9553 | -| DIGINETICA | 59,425 | 42,116 | 547,416 | 0.9998 | -| Yelp | 102,046 | 98,408 | 2,903,648 | 0.9997 | +| Dataset | #User | #Item | #Interaction | Sparsity | +| ---------- | ------: | ------: | -----------: | -------: | +| ml-1m | 6,040 | 3,629 | 836,478 | 0.9618 | +| DIGINETICA | 72,014 | 29,454 | 580,490 | 0.9997 | +| Yelp | 45,478 | 30,709 | 1,777,765 | 0.9987 | ### Device information ``` OS: Linux -Python Version: 3.8.3 -PyTorch Version: 1.7.0 +Python Version: 3.8.10 +PyTorch Version: 1.8.1 cudatoolkit Version: 10.1 -GPU: TITAN RTX(24GB) -Machine Specs: 32 CPU machine, 64GB RAM +GPU: TITAN V(12GB) +Machine Specs: 14 CPU machine, 256GB RAM ``` ### 1) ml-1m dataset: #### Time and memory cost on ml-1m dataset: -| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | -| ---------------- | -----------------: | -----------------: | -----------: | -| Improved GRU-Rec | 7.78 | 0.11 | 1.27 | -| SASRec | 17.78 | 0.12 | 1.84 | -| NARM | 8.29 | 0.11 | 1.29 | -| FPMC | 7.51 | 0.11 | 1.18 | -| STAMP | 7.32 | 0.11 | 1.20 | -| Caser | 44.85 | 0.12 | 1.14 | -| NextItNet | 16433.27 | 96.31 | 1.86 | -| TransRec | 10.08 | 0.16 | 8.18 | -| S3Rec | - | - | - | -| GRU4RecF | 10.20 | 0.15 | 1.80 | -| SASRecF | 18.84 | 0.17 | 1.78 | -| BERT4Rec | 36.09 | 0.34 | 1.97 | -| FDSA | 31.86 | 0.19 | 2.32 | -| SRGNN | 327.38 | 2.19 | 1.21 | -| GCSAN | 335.27 | 0.02 | 1.58 | -| KSR | - | - | - | -| GRU4RecKG | - | - | - | +| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | +| ---------------- | ------------------------: | --------------------------: | -----------------------: | +| Improved GRU-Rec | 6.15 | 0.06 | 1.46 | +| SASRec | 24.86 | 0.09 | 4.11 | +| NARM | 7.34 | 0.07 | 1.51 | +| FPMC | 6.07 | 0.05 | 0.30 | +| STAMP | 3.10 | 0.10 | 0.65 | +| Caser | 549.40 | 0.21 | 1.08 | +| NextItNet | 203.85 | 0.71 | 3.68 | +| TransRec | 5.81 | 0.12 | 7.21 | +| S3Rec | - | - | CUDA out of memory | +| GRU4RecF | 13.54 | 0.09 | 2.18 | +| SASRecF | 28.16 | 0.11 | 5.08 | +| BERT4Rec | 56.57 | 0.32 | 6.20 | +| FDSA | 50.18 | 0.14 | 8.12 | +| SRGNN | 631.08 | 4.58 | 1.12 | +| GCSAN | 671.66 | 4.99 | 2.96 | +| KSR | 61.18 | 0.21 | 6.94 | +| GRU4RecKG | 11.36 | 0.07 | 2.22 | #### Config file of ml-1m dataset: @@ -51,63 +51,72 @@ field_separator: "\t" seq_separator: " " USER_ID_FIELD: user_id ITEM_ID_FIELD: item_id +RATING_FIELD: rating TIME_FIELD: timestamp NEG_PREFIX: neg_ ITEM_LIST_LENGTH_FIELD: item_length LIST_SUFFIX: _list -MAX_ITEM_LIST_LENGTH: 20 +MAX_ITEM_LIST_LENGTH: 50 POSITION_FIELD: position_id load_col: - inter: [user_id, item_id, timestamp] -min_user_inter_num: 0 -min_item_inter_num: 0 + inter: [user_id, item_id, rating, timestamp] +val_interval: + rating: "[3,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 +train_batch_size: 4096 eval_batch_size: 2048 -valid_metric: MRR@10 -eval_setting: TO_LS,full -training_neg_sample_num: 0 +valid_metric: recall@10 +eval_args: + split: {'LS': 'valid_and_test'} + mode: full + order: TO +neg_sampling: ~ + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. **NOTE :** -1) For FPMC and TransRec model, `training_neg_sample_num` should be `1` . +1) For FPMC and TransRec model, `neg_sampling` should be `{'uniform': 1}` . -2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: +2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: ``` load_col: - inter: [user_id, item_id, timestamp] + inter: [user_id, item_id, rating, timestamp] item: [item_id, genre] ``` +3) For KSR and GRU4RecKG, you should prepare pretrained knowledge graph embedding. + ### 2)DIGINETICA dataset: #### Time and memory cost on DIGINETICA dataset: -| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | -| ---------------- | -----------------: | -----------------: | -----------: | -| Improved GRU-Rec | 4.10 | 1.05 | 4.02 | -| SASRec | 8.36 | 1.21 | 4.43 | -| NARM | 4.30 | 1.08 | 4.09 | -| FPMC | 2.98 | 1.08 | 4.08 | -| STAMP | 4.27 | 1.04 | 3.88 | -| Caser | 17.15 | 1.18 | 3.94 | -| NextItNet | 6150.49 | 947.66 | 4.54 | -| TransRec | - | - | Out of Memory | -| S3Rec | - | - | - | -| GRU4RecF | 4.79 | 1.17 | 4.83 | -| SASRecF | 8.66 | 1.29 | 5.11 | -| BERT4Rec | 16.80 | 3.54 | 7.97 | -| FDSA | 13.44 | 1.47 | 5.66 | -| SRGNN | 88.59 | 15.37 | 4.01 | -| GCSAN | 96.69 | 17.11 | 4.25 | -| KSR | - | - | - | -| GRU4RecKG | - | - | - | +| Method | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) | +| ---------------- | ------------------------: | --------------------------: | -----------------------: | +| Improved GRU-Rec | 2.22 | 0.74 | 2.70 | +| SASRec | 6.88 | 0.88 | 3.53 | +| NARM | 2.42 | 0.74 | 2.37 | +| FPMC | 2.44 | 0.72 | 2.31 | +| STAMP | 2.01 | 0.71 | 2.44 | +| Caser | 17.51 | 0.86 | 2.49 | +| NextItNet | 34.00 | 2.98 | 3.71 | +| TransRec | - | - | CUDA out of memory | +| S3Rec | 160.15 | - | 6.20 | +| GRU4RecF | 3.00 | 0.84 | 3.02 | +| SASRecF | 6.67 | 0.93 | 3.61 | +| BERT4Rec | 12.97 | 3.43 | 10.26 | +| FDSA | 11.17 | 1.10 | 4.71 | +| SRGNN | 66.67 | 14.81 | 2.66 | +| GCSAN | 69.03 | 14.29 | 3.13 | +| KSR | - | - | - | +| GRU4RecKG | - | - | - | #### Config file of DIGINETICA dataset: @@ -125,25 +134,31 @@ MAX_ITEM_LIST_LENGTH: 20 POSITION_FIELD: position_id load_col: inter: [session_id, item_id, timestamp] -min_user_inter_num: 6 -min_item_inter_num: 1 +user_inter_num_interval: "[5,inf)" +item_inter_num_interval: "[5,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 +train_batch_size: 4096 eval_batch_size: 2048 valid_metric: MRR@10 -eval_setting: TO_LS,full -training_neg_sample_num: 0 +eval_args: + split: {'LS':"valid_and_test"} + mode: full + order: TO +neg_sampling: ~ + +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. **NOTE :** -1) For FPMC and TransRec model, `training_neg_sample_num` should be `1` . +1) For FPMC and TransRec model, `neg_sampling` should be `{'uniform': 1}` . -2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: +2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: ``` load_col: @@ -151,29 +166,31 @@ load_col: item: [item_id, item_category] ``` +3) For KSR and GRU4RecKG, you should prepare pretrained knowledge graph embedding. + ### 3)Yelp dataset: #### Time and memory cost on Yelp dataset: -| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | -| ---------------- | -----------------: | -----------------: | -----------: | -| Improved GRU-Rec | 44.31 | 2.74 | 7.92 | -| SASRec | 75.51 | 3.11 | 8.32 | -| NARM | 45.65 | 2.76 | 7.98 | -| FPMC | 21.05 | 3.05 | 8.22 | -| STAMP | 42.08 | 2.72 | 7.77 | -| Caser | 147.15 | 2.89 | 7.87 | -| NextItNet | 45019.38 | 1670.76 | 8.44 | -| TransRec | - | - | Out of Memory | -| S3Rec | - | - | - | -| GRU4RecF | - | - | Out of Memory | -| SASRecF | - | - | Out of Memory | -| BERT4Rec | 193.74 | 8.43 | 16.57 | -| FDSA | - | - | Out of Memory | -| SRGNN | 825.11 | 33.20 | 7.90 | -| GCSAN | 837.23 | 33.00 | 8.14 | -| KSR | - | - | - | -| GRU4RecKG | - | - | - | +| Method | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) | +| ---------------- | ------------------------: | --------------------------: | -----------------------: | +| Improved GRU-Rec | 16.52 | 0.66 | 3.34 | +| SASRec | 54.92 | 0.90 | 5.82 | +| NARM | 19.35 | 0.79 | 3.26 | +| FPMC | 11.44 | 0.58 | 0.76 | +| STAMP | 9.97 | 0.59 | 2.76 | +| Caser | 1105.96 | 2.02 | 3.21 | +| NextItNet | 413.43 | 5.57 | 5.73 | +| TransRec | - | - | CUDA out of memory | +| S3Rec | - | - | CUDA out of memory | +| GRU4RecF | 62,41 | 1.32 | 7.36 | +| SASRecF | 86.20 | 1.42 | 6.12 | +| BERT4Rec | - | - | CUDA out of memory | +| FDSA | 133.52 | 1.87 | 9.19 | +| SRGNN | 1165.06 | 30.35 | 3.41 | +| GCSAN | 1112.46 | 29.84 | 4.67 | +| KSR | - | - | - | +| GRU4RecKG | - | - | - | #### Config file of Yelp dataset: @@ -188,44 +205,42 @@ TIME_FIELD: date NEG_PREFIX: neg_ ITEM_LIST_LENGTH_FIELD: item_length LIST_SUFFIX: _list -MAX_ITEM_LIST_LENGTH: 20 +MAX_ITEM_LIST_LENGTH: 50 POSITION_FIELD: position_id load_col: inter: [user_id, business_id, stars, date] -min_user_inter_num: 10 -min_item_inter_num: 4 -lowest_val: - stars: 3 -unused_col: - inter: [stars] + +user_inter_num_interval: "[15,inf)" +item_inter_num_interval: "[15,inf)" +val_interval: + stars: "[3,inf)" # training and evaluation epochs: 500 -train_batch_size: 2048 -eval_batch_size: 2048 +train_batch_size: 4096 +eval_batch_size: 512 valid_metric: MRR@10 -eval_setting: TO_LS,full -training_neg_sample_num: 0 +eval_args: + split: {'LS':"valid_and_test"} + order: TO +neg_sampling: ~ +# model +embedding_size: 64 ``` Other parameters (including model parameters) are default value. **NOTE :** -1) For FPMC and TransRec model, `training_neg_sample_num` should be `1` . +1) For FPMC and TransRec model, `neg_sampling` should be `{'uniform': 1}` . -2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: +2) For SASRecF, GRU4RecF and FDSA, `load_col` should as below: ``` load_col: - inter: [session_id, item_id, timestamp] - item: [item_id, item_category] + inter: [user_id, business_id, stars, date] + item: [business_id, categories] ``` - - - - - - +3) For KSR and GRU4RecKG, you should prepare pretrained knowledge graph embedding. \ No newline at end of file diff --git a/recbole/data/dataset/kg_dataset.py b/recbole/data/dataset/kg_dataset.py index e630946fc..38542c354 100644 --- a/recbole/data/dataset/kg_dataset.py +++ b/recbole/data/dataset/kg_dataset.py @@ -118,7 +118,7 @@ def _download(self): self.logger.info( f'\nLinked KG for [{self.dataset_name}] requires additional conversion ' f'to atomic files (.kg and .link).\n' - f'Please refer to https://github.com/RUCAIBox/RecSysDatasets/conversion_tools#knowledge-aware-datasets ' + f'Please refer to https://github.com/RUCAIBox/RecSysDatasets/tree/master/conversion_tools#knowledge-aware-datasets ' f'for detailed instructions.\n' f'You can run RecBole after the conversion, see you soon.' )