diff --git a/.github/ISSUE_TEMPLATE/bug_report_CN.md b/.github/ISSUE_TEMPLATE/bug_report_CN.md
new file mode 100644
index 000000000..9e1de6cc1
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report_CN.md
@@ -0,0 +1,33 @@
+---
+name: Bug 报告
+about: 提交一份 bug 报告，帮助 RecBole 变得更好
+title: "[\U0001F41BBUG] 用一句话描述您的问题。"
+labels: bug
+assignees: ''
+
+---
+
+**描述这个 bug**
+对 bug 作一个清晰简明的描述。
+
+**如何复现**
+复现这个 bug 的步骤：
+1. 您引入的额外 yaml 文件
+2. 您的代码
+3. 您的运行脚本
+
+**预期**
+对您的预期作清晰简明的描述。
+
+**屏幕截图**
+添加屏幕截图以帮助解释您的问题。（可选）
+
+**链接**
+添加能够复现 bug 的代码链接，如 Colab 或者其他在线 Jupyter 平台。（可选）
+
+**实验环境（请补全下列信息）：**
+ - 操作系统: [如 Linux, macOS 或 Windows]
+- RecBole 版本 [如 0.1.0]
+ - Python 版本 [如 3.79]
+- PyTorch 版本 [如 1.60]
+- cudatoolkit 版本 [如 9.2, none]
diff --git a/.github/ISSUE_TEMPLATE/feature_request_CN.md b/.github/ISSUE_TEMPLATE/feature_request_CN.md
new file mode 100644
index 000000000..861dcc82d
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request_CN.md
@@ -0,0 +1,20 @@
+---
+name: 请求添加新功能
+about: 提出一个关于本项目新功能/新特性的建议
+title: "[\U0001F4A1SUG] 一句话描述您希望新增的功能或特性"
+labels: enhancement
+assignees: ''
+
+---
+
+**您希望添加的功能是否与某个问题相关？**
+关于这个问题的简洁清晰的描述，例如，当 [...] 时，我总是很沮丧。
+
+**描述您希望的解决方案**
+关于解决方案的简洁清晰的描述。
+
+**描述您考虑的替代方案**
+关于您考虑的，能实现这个功能的其他替代方案的简洁清晰的描述。
+
+**其他**
+您可以添加其他任何的资料、链接或者屏幕截图，以帮助我们理解这个新功能。
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
index e879511d4..45d193691 100644
--- a/.github/workflows/python-package.yml
+++ b/.github/workflows/python-package.yml
@@ -24,12 +24,18 @@ jobs:
         pip install pytest
         pip install dgl
         pip install xgboost
+        pip install community
+        pip install networkx
+        pip install python-louvain
         if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
         
     # Use "python -m pytest" instead of "pytest" to fix imports
     - name: Test metrics
       run: |
         python -m pytest -v tests/metrics
+    - name: Test data
+      run: |
+        python -m pytest -v tests/data
     - name: Test evaluation_setting
       run: |
         python -m pytest -v tests/evaluation_setting
@@ -41,4 +47,4 @@ jobs:
         python -m pytest -v tests/config/test_config.py
         export PYTHONPATH=.
         python tests/config/test_command_line.py --use_gpu=False --valid_metric=Recall@10 --split_ratio=[0.7,0.2,0.1] --metrics=['Recall@10'] --epochs=200 --eval_setting='LO_RS' --learning_rate=0.3
-    
+
diff --git a/.gitignore b/.gitignore
index 49788101f..660977853 100644
--- a/.gitignore
+++ b/.gitignore
@@ -9,3 +9,4 @@
 saved/
 *.lprof
 *.egg-info/
+docs/build/
\ No newline at end of file
diff --git a/README.md b/README.md
index 6c6d131b3..148e2e48e 100644
--- a/README.md
+++ b/README.md
@@ -11,24 +11,25 @@
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
 
 
-[HomePage] | [Docs] | [Datasets] | [Paper] | [Blogs]
+[HomePage] | [Docs] | [Datasets] | [Paper] | [Blogs] | [中文版]
 
 [HomePage]: https://recbole.io/
 [Docs]: https://recbole.io/docs/
 [Datasets]: https://github.com/RUCAIBox/RecDatasets
 [Paper]: https://arxiv.org/abs/2011.01731
-[Blogs]: #blogs
+[Blogs]: https://blog.csdn.net/Turinger_2000/article/details/111182852
+[中文版]: README_CN.md
 
 RecBole is developed based on Python and PyTorch for reproducing and developing recommendation algorithms in a unified,
 comprehensive and efficient framework for research purpose.
-Our library includes 53 recommendation algorithms, covering four major categories:
+Our library includes 65 recommendation algorithms, covering four major categories:
 
 + General Recommendation
 + Sequential Recommendation
 + Context-aware Recommendation
 + Knowledge-based Recommendation
 
-We design a unified and flexible data file format, and provide the support for 27 benchmark recommendation datasets.
+We design a unified and flexible data file format, and provide the support for 28 benchmark recommendation datasets.
 A user can apply the provided script to process the original data copy, or simply download the processed datasets
 by our team.
 
@@ -44,8 +45,8 @@ by our team.
 + **General and extensible data structure.** We design general and extensible data structures to unify the formatting and
 usage of various recommendation datasets.
 
-+ **Comprehensive benchmark models and datasets.** We implement 53 commonly used recommendation algorithms, and provide
-the formatted copies of 27 recommendation datasets.
++ **Comprehensive benchmark models and datasets.** We implement 65 commonly used recommendation algorithms, and provide
+the formatted copies of 28 recommendation datasets.
 
 + **Efficient GPU-accelerated execution.** We optimize the efficiency of our library with a number of improved techniques
 oriented to the GPU environment.
@@ -53,8 +54,11 @@ oriented to the GPU environment.
 + **Extensive and standard evaluation protocols.** We support a series of widely adopted evaluation protocols or settings
 for testing and comparing recommendation algorithms.
 
+
 ## RecBole News
-**12/10/2020**: 我们发布了RecBole小白入门系列中文博客。
+**01/15/2021**: We release RecBole [v0.2.0](https://github.com/RUCAIBox/RecBole/releases/tag/v0.2.0).
+
+**12/10/2020**: 我们发布了[RecBole小白入门系列中文博客（持续更新中）](https://blog.csdn.net/Turinger_2000/article/details/111182852) 。
 
 **12/06/2020**: We release RecBole [v0.1.2](https://github.com/RUCAIBox/RecBole/releases/tag/v0.1.2).
 
@@ -64,11 +68,6 @@ for reference.
 
 **11/03/2020**: We release the first version of RecBole **v0.1.1**.
 
-## Blogs
-
-[RecBole小白入门系列博客（一）——快速安装和简单上手](https://blog.csdn.net/Turinger_2000/article/details/110414642)
-
-[RecBole小白入门系列博客（二） ——General类模型运行流程](https://blog.csdn.net/Turinger_2000/article/details/110395198)
 
 ## Installation
 RecBole works with the following operating systems:
@@ -169,20 +168,23 @@ python run_recbole.py --model=[model_name]
 
 
 ## Time and Memory Costs
-We constructed preliminary experiments to test the time and memory cost on three different-sized datasets  (small, medium and large). For detailed information, you can click the following links.<br> 
+We constructed preliminary experiments to test the time and memory cost on three different-sized datasets 
+(small, medium and large). For detailed information, you can click the following links.
 
-* [General recommendation models](asset/time_test_result/General_recommendation.md)<br>
-* [Sequential recommendation models](asset/time_test_result/Sequential_recommendation.md)<br>
-* [Context-aware recommendation models](asset/time_test_result/Context-aware_recommendation.md)<br>
-* [Knowledge-based recommendation models](asset/time_test_result/Knowledge-based_recommendation.md)<br>
+* [General recommendation models](asset/time_test_result/General_recommendation.md)
+* [Sequential recommendation models](asset/time_test_result/Sequential_recommendation.md)
+* [Context-aware recommendation models](asset/time_test_result/Context-aware_recommendation.md)
+* [Knowledge-based recommendation models](asset/time_test_result/Knowledge-based_recommendation.md)
 
-NOTE: Our test results only gave the approximate time and memory cost of our implementations in the RecBole library (based on our machine server).  Any feedback or suggestions about the implementations and test are welcome. We will keep improving our implementations, and update these test results.
+NOTE: Our test results only gave the approximate time and memory cost of our implementations in the RecBole library
+(based on our machine server).  Any feedback or suggestions about the implementations and test are welcome. 
+We will keep improving our implementations, and update these test results.
 
 
 ## RecBole Major Releases
 | Releases  | Date   | Features |
 |-----------|--------|-------------------------|
-| v0.1.2    | 12/06/2020 |  Basic RecBole |
+| v0.2.0    | 01/15/2021 |  RecBole |
 | v0.1.1    | 11/03/2020 |  Basic RecBole |
 
 ## Contributing
@@ -193,6 +195,9 @@ We welcome all contributions from bug fixes to new features and extensions.
 
 We expect all contributions discussed in the issue tracker and going through PRs.
 
+We thank the insightful suggestions from [@tszumowski](https://github.com/tszumowski), [@rowedenny](https://github.com/rowedenny), [@deklanw](https://github.com/deklanw) et.al.
+
+We thank the nice contributions through PRs from [@rowedenny](https://github.com/rowedenny)，[@deklanw](https://github.com/deklanw) et.al.
 
 ## Cite
 If you find RecBole useful for your research or development, please cite the following [paper](https://arxiv.org/abs/2011.01731):
diff --git a/README_CN.md b/README_CN.md
new file mode 100644
index 000000000..024656bdb
--- /dev/null
+++ b/README_CN.md
@@ -0,0 +1,210 @@
+![RecBole Logo](asset/logo.png)
+
+--------------------------------------------------------------------------------
+
+# RecBole (伯乐)
+
+*“世有伯乐，然后有千里马。千里马常有，而伯乐不常有。”——韩愈《马说》*
+
+[![PyPi Latest Release](https://img.shields.io/pypi/v/recbole)](https://pypi.org/project/recbole/)
+[![Conda Latest Release](https://anaconda.org/aibox/recbole/badges/version.svg)](https://anaconda.org/aibox/recbole)
+[![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
+
+
+[中文主页] | [文档] | [数据集] | [论文] | [博客] | [English Version]
+
+[中文主页]: https://recbole.io/cn
+[文档]: https://recbole.io/docs/
+[数据集]: https://github.com/RUCAIBox/RecDatasets
+[论文]: https://arxiv.org/abs/2011.01731
+[博客]: https://blog.csdn.net/Turinger_2000/article/details/111182852
+[English Version]: README.md
+
+
+RecBole 是一个基于 PyTorch 实现的，面向研究者的，易于开发与复现的，统一、全面、高效的推荐系统代码库。
+我们实现了53个推荐系统模型，包含常见的推荐系统类别，如:
+
++ General Recommendation
++ Sequential Recommendation
++ Context-aware Recommendation
++ Knowledge-based Recommendation
+
+
+我们约定了一个统一、易用的数据文件格式，并已支持 27 个 benchmark dataset。
+用户可以选择使用我们的数据集预处理脚本，或直接下载已被处理好的数据集文件。
+
+
+<p align="center">
+  <img src="asset/framework.png" alt="RecBole v0.1 架构" width="600">
+  <br>
+  <b>图片</b>: RecBole 总体架构
+</p>
+
+
+## 特色
++ **通用和可扩展的数据结构** 我们设计了通用和可扩展的数据结构来支持各种推荐数据集统一化格式和使用。
+
++ **全面的基准模型和数据集** 我们实现了53个常用的推荐算法，并提供了27个推荐数据集的格式化副本。
+
++ **高效的 GPU 加速实现** 我们针对GPU环境使用了一系列的优化技术来提升代码库的效率。
+
++ **大规模的标准评测** 我们支持一系列被广泛认可的评估方式来测试和比较不同的推荐算法。
+
+
+## RecBole 新闻
+**12/10/2020**: 我们发布了[RecBole小白入门系列中文博客（持续更新中）](https://blog.csdn.net/Turinger_2000/article/details/111182852) 。
+
+**12/06/2020**: 我们发布了 RecBole [v0.1.2](https://github.com/RUCAIBox/RecBole/releases/tag/v0.1.2).
+
+**11/29/2020**: 我们在三个不同大小的数据集上进行了时间和内存开销的初步测试，
+并提供了 [测试结果](https://github.com/RUCAIBox/RecBole#time-and-memory-costs) 以供参考。
+
+**11/03/2020**: 我们发布了第一版 RecBole **v0.1.1**.
+
+
+## 安装
+RecBole可以在以下几种系统上运行:
+
+* Linux
+* Windows 10
+* macOS X
+
+RecBole需要在python 3.6或更高的环境下运行。
+
+RecBole要求torch版本在1.6.0及以上，如果你想在GPU上运行RecBole，请确保你的CUDA版本或CUDAToolkit版本在9.2及以上。
+这需要你的NVIDIA驱动版本为396.26或以上（在linux系统上）或者为397.44或以上（在Windows10系统上）。
+
+
+### 从Conda安装
+
+```bash
+conda install -c aibox recbole
+```
+
+### 从pip安装
+
+```bash
+pip install recbole
+```
+
+### 从源文件安装
+```bash
+git clone https://github.com/RUCAIBox/RecBole.git && cd RecBole
+pip install -e . --verbose
+```
+
+## 快速上手
+如果你从GitHub下载了RecBole的源码，你可以使用提供的脚本进行简单的使用：
+
+```bash
+python run_recbole.py
+```
+
+这个例子将会在ml-100k这个数据集上进行BPR模型的训练和测试。
+
+一般来说，这个例子将花费不到一分钟的时间，我们会得到一些类似下面的输出：
+
+```
+INFO ml-100k
+The number of users: 944
+Average actions of users: 106.04453870625663
+The number of items: 1683
+Average actions of items: 59.45303210463734
+The number of inters: 100000
+The sparsity of the dataset: 93.70575143257098%
+
+INFO Evaluation Settings:
+Group by user_id
+Ordering: {'strategy': 'shuffle'}
+Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}
+Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
+
+INFO BPRMF(
+    (user_embedding): Embedding(944, 64)
+    (item_embedding): Embedding(1683, 64)
+    (loss): BPRLoss()
+)
+Trainable parameters: 168128
+
+INFO epoch 0 training [time: 0.27s, train loss: 27.7231]
+INFO epoch 0 evaluating [time: 0.12s, valid_score: 0.021900]
+INFO valid result:
+recall@10: 0.0073  mrr@10: 0.0219  ndcg@10: 0.0093  hit@10: 0.0795  precision@10: 0.0088
+
+...
+
+INFO epoch 63 training [time: 0.19s, train loss: 4.7660]
+INFO epoch 63 evaluating [time: 0.08s, valid_score: 0.394500]
+INFO valid result:
+recall@10: 0.2156  mrr@10: 0.3945  ndcg@10: 0.2332  hit@10: 0.7593  precision@10: 0.1591
+
+INFO Finished training, best eval result in epoch 52
+INFO Loading model structure and parameters from saved/***.pth
+INFO best valid result:
+recall@10: 0.2169  mrr@10: 0.4005  ndcg@10: 0.235  hit@10: 0.7582  precision@10: 0.1598
+INFO test result:
+recall@10: 0.2368  mrr@10: 0.4519  ndcg@10: 0.2768  hit@10: 0.7614  precision@10: 0.1901
+```
+
+如果你要改参数，例如 ``learning_rate``, ``embedding_size``, 只需根据您的需求增加额外的参数，例如：
+
+```bash
+python run_recbole.py --learning_rate=0.0001 --embedding_size=128
+```
+
+如果你想改变运行模型，只需要在执行脚本时添加额外的设置参数即可：
+
+```bash
+python run_recbole.py --model=[model_name]
+```
+
+
+## 时间和内存开销
+我们构建了初步的实验来测试三个不同大小的数据集（小、中、大）的时间和内存开销。
+有关详细信息，请单击以下链接。
+
+* [General recommendation models](asset/time_test_result/General_recommendation.md)<br>
+* [Sequential recommendation models](asset/time_test_result/Sequential_recommendation.md)<br>
+* [Context-aware recommendation models](asset/time_test_result/Context-aware_recommendation.md)<br>
+* [Knowledge-based recommendation models](asset/time_test_result/Knowledge-based_recommendation.md)<br>
+
+NOTE: 我们的测试结果只给出了RecBole库中实现模型的大致时间和内存开销（基于我们的机器服务器）。
+我们欢迎任何关于测试、实现的建议。我们将继续改进我们的实现，并更新这些测试结果。
+
+
+## RecBole 重要发布
+| Releases  | Date   | Features |
+|-----------|--------|-------------------------|
+| v0.1.1    | 11/03/2020 |  Basic RecBole |
+
+
+## 贡献
+
+如果您遇到错误或有任何建议，请通过 [Issue](https://github.com/RUCAIBox/RecBole/issues) 进行反馈
+
+我们欢迎关于修复错误、添加新特性的任何贡献。
+
+如果想贡献代码，请先在issue中提出问题，然后再提PR。
+
+我们对[@tszumowski](https://github.com/tszumowski), [@rowedenny](https://github.com/rowedenny), [@deklanw](https://github.com/deklanw) 等用户提出的建议表示感谢。
+
+我们也对[@rowedenny](https://github.com/rowedenny), [@deklanw](https://github.com/deklanw) 等用户做出的贡献表示感谢。
+
+
+## 引用
+如果你觉得RecBole对你的科研工作有帮助，请引用我们的[论文](https://arxiv.org/abs/2011.01731):
+
+```
+@article{recbole,
+    title={RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms},
+    author={Wayne Xin Zhao and Shanlei Mu and Yupeng Hou and Zihan Lin and Kaiyuan Li and Yushuo Chen and Yujie Lu and Hui Wang and Changxin Tian and Xingyu Pan and Yingqian Min and Zhichao Feng and Xinyan Fan and Xu Chen and Pengfei Wang and Wendi Ji and Yaliang Li and Xiaoling Wang and Ji-Rong Wen},
+    year={2020},
+    journal={arXiv preprint arXiv:2011.01731}
+}
+```
+
+## 项目团队
+RecBole由 [中国人民大学, 北京邮电大学, 华东师范大学](https://www.recbole.io/cn/about.html) 的同学和老师进行开发和维护。 
+
+## 免责声明
+RecBole 基于 [MIT License](./LICENSE) 进行开发，本项目的所有数据和代码只能被用于学术目的。
diff --git a/asset/framework.png b/asset/framework.png
index a3a1cdc22..add0b8028 100644
Binary files a/asset/framework.png and b/asset/framework.png differ
diff --git a/asset/logo.png b/asset/logo.png
index bd828fae0..047e61bcf 100644
Binary files a/asset/logo.png and b/asset/logo.png differ
diff --git a/asset/time_test_result/Context-aware_recommendation.md b/asset/time_test_result/Context-aware_recommendation.md
index 39751b0b4..fa518892a 100644
--- a/asset/time_test_result/Context-aware_recommendation.md
+++ b/asset/time_test_result/Context-aware_recommendation.md
@@ -1,189 +1,191 @@
-## Time and memory cost of context-aware recommendation models 
-
-### Datasets information:
-
-| Dataset | #Interaction | #Feature Field | #Feature |
-| ------- | ------------: | --------------: | --------: |
-| ml-1m   | 1,000,209    | 5              | 134      |
-| Criteo  | 2,292,530    | 39             | 2,572,192 |
-| Avazu   | 4,218,938    | 21             | 1,326,631 |
-
-### Device information
-
-```
-OS:                   Linux
-Python Version:       3.8.3
-PyTorch Version:      1.7.0
-cudatoolkit Version:  10.1
-GPU:                  TITAN RTX（24GB）
-Machine Specs:        32 CPU machine, 64GB RAM
-```
-
-### 1) ml-1m dataset:
-
-#### Time and memory cost on ml-1m dataset:
-
-| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
-| --------- | -----------------: | -----------------: | -----------: |
-| LR        | 18.34             | 2.18              | 0.82        |
-| DIN       | 20.37             | 2.26              | 1.16        |
-| DSSM      | 21.93             | 2.24              | 0.95        |
-| FM        | 19.33             | 2.34              | 0.83        |
-| DeepFM    | 20.42             | 2.27              | 0.91        |
-| Wide&Deep | 26.13             | 2.95              | 0.89        |
-| NFM       | 23.36             | 2.26              | 0.89        |
-| AFM       | 20.08             | 2.26              | 0.92        |
-| AutoInt   | 22.41             | 2.34              | 0.94        |
-| DCN       | 28.33             | 2.97              | 0.93        |
-| FNN(DNN)  | 19.51             | 2.21              | 0.91        |
-| PNN       | 22.29             | 2.23              | 0.91        |
-| FFM       | 22.98             | 2.47              | 0.87        |
-| FwFM      | 23.38             | 2.50              | 0.85        |
-| xDeepFM   | 24.40             | 2.30              | 1.06        |
-
-#### Config file of ml-1m dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: user_id
-ITEM_ID_FIELD: item_id
-LABEL_FIELD: label
-threshold:
-  rating: 4.0
-drop_filter_field : True
-load_col:
-  inter: [user_id, item_id, rating]
-  item: [item_id, release_year, genre]
-  user: [user_id, age, gender, occupation]
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-eval_setting: RO_RS
-group_by_user: False
-valid_metric: AUC
-metrics: ['AUC', 'LogLoss']
-```
-
-Other parameters (including model parameters) are default value. 
-
-### 2）Criteo dataset:
-
-#### Time and memory cost on Criteo dataset:
-
-| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
-| --------- | -------------------------: | ---------------------------: | ---------------: |
-| LR        | 7.65                      | 0.61                        | 1.11            |
-| DIN       | -                         | -                           | -               |
-| DSSM      | -                         | -                           | -               |
-| FM        | 9.77                      | 0.73                        | 1.45            |
-| DeepFM    | 13.64                     | 0.83                        | 1.72            |
-| Wide&Deep | 13.58                     | 0.80                        | 1.72            |
-| NFM       | 13.36                     | 0.75                        | 1.72            |
-| AFM       | 19.40                     | 1.02                        | 2.34            |
-| AutoInt   | 19.40                     | 0.98                        | 2.06            |
-| DCN       | 16.25                     | 0.78                        | 1.67            |
-| FNN(DNN)  | 10.03                     | 0.64                        | 1.63            |
-| PNN       | 12.92                     | 0.72                        | 1.85            |
-| FFM       | -                         | -                           | -               |
-| FwFM      | 1175.24                   | 8.90                        | 2.12            |
-| xDeepFM   | 32.27                     | 1.34                        | 2.25            |
-
-#### Config file of Criteo dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: ~
-ITEM_ID_FIELD: ~
-LABEL_FIELD: label
-
-load_col: 
-    inter: '*'
-
-highest_val:
-    index: 2292530
-
-fill_nan: True
-normalize_all: True
-min_item_inter_num: 0
-min_user_inter_num: 0
-
-drop_filter_field : True
-
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-eval_setting: RO_RS
-group_by_user: False
-valid_metric: AUC
-metrics: ['AUC', 'LogLoss']
-```
-
-Other parameters (including model parameters) are default value. 
-
-### 3）Avazu dataset:
-
-#### Time and memory cost on Avazu dataset:
-
-| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
-| --------- | -------------------------: | ---------------------------: | ---------------: |
-| LR        | 9.30                      | 0.76                        | 1.42            |
-| DIN       | -                         | -                           | -               |
-| DSSM      | -                         | -                           | -               |
-| FM        | 25.68                     | 0.94                        | 2.60            |
-| DeepFM    | 28.41                     | 1.19                        | 2.66            |
-| Wide&Deep | 27.58                     | 0.97                        | 2.66            |
-| NFM       | 30.46                     | 1.06                        | 2.66            |
-| AFM       | 31.03                     | 1.06                        | 2.69            |
-| AutoInt   | 38.11                     | 1.41                        | 2.84            |
-| DCN       | 30.78                     | 0.96                        | 2.64            |
-| FNN(DNN)  | 23.53                     | 0.84                        | 2.60            |
-| PNN       | 25.86                     | 0.90                        | 2.68            |
-| FFM       | -                         | -                           | -               |
-| FwFM      | 336.75                    | 7.49                        | 2.63            |
-| xDeepFM   | 54.88                     | 1.45                        | 2.89            |
-
-#### Config file of Avazu dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: ~
-ITEM_ID_FIELD: ~
-LABEL_FIELD: label
-fill_nan: True
-normalize_all: True
-
-load_col:
-    inter: '*'
-    
-lowest_val:
-  timestamp: 14102931
-drop_filter_field : False
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-eval_setting: RO_RS
-group_by_user: False
-valid_metric: AUC
-metrics: ['AUC', 'LogLoss']
-```
-
-Other parameters (including model parameters) are default value. 
-
-
-
-
-
-
-
+## Time and memory cost of context-aware recommendation models 
+
+### Datasets information:
+
+| Dataset | #Interaction | #Feature Field | #Feature |
+| ------- | ------------: | --------------: | --------: |
+| ml-1m   | 1,000,209    | 5              | 134      |
+| Criteo  | 2,292,530    | 39             | 2,572,192 |
+| Avazu   | 4,218,938    | 21             | 1,326,631 |
+
+### Device information
+
+```
+OS:                   Linux
+Python Version:       3.8.3
+PyTorch Version:      1.7.0
+cudatoolkit Version:  10.1
+GPU:                  TITAN RTX（24GB）
+Machine Specs:        32 CPU machine, 64GB RAM
+```
+
+### 1) ml-1m dataset:
+
+#### Time and memory cost on ml-1m dataset:
+
+| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
+| --------- | -----------------: | -----------------: | -----------: |
+| LR        | 18.34             | 2.18              | 0.82        |
+| DIN       | 20.37             | 2.26              | 1.16        |
+| DSSM      | 21.93             | 2.24              | 0.95        |
+| FM        | 19.33             | 2.34              | 0.83        |
+| DeepFM    | 20.42             | 2.27              | 0.91        |
+| Wide&Deep | 26.13             | 2.95              | 0.89        |
+| NFM       | 23.36             | 2.26              | 0.89        |
+| AFM       | 20.08             | 2.26              | 0.92        |
+| AutoInt   | 22.41             | 2.34              | 0.94        |
+| DCN       | 28.33             | 2.97              | 0.93        |
+| FNN(DNN)  | 19.51             | 2.21              | 0.91        |
+| PNN       | 22.29             | 2.23              | 0.91        |
+| FFM       | 22.98             | 2.47              | 0.87        |
+| FwFM      | 23.38             | 2.50              | 0.85        |
+| xDeepFM   | 24.40             | 2.30              | 1.06        |
+
+#### Config file of ml-1m dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: user_id
+ITEM_ID_FIELD: item_id
+LABEL_FIELD: label
+threshold:
+  rating: 4.0
+drop_filter_field : True
+load_col:
+  inter: [user_id, item_id, rating]
+  item: [item_id, release_year, genre]
+  user: [user_id, age, gender, occupation]
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+eval_setting: RO_RS
+group_by_user: False
+valid_metric: AUC
+metrics: ['AUC', 'LogLoss']
+```
+
+Other parameters (including model parameters) are default value. 
+
+### 2）Criteo dataset:
+
+#### Time and memory cost on Criteo dataset:
+
+| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
+| --------- | -------------------------: | ---------------------------: | ---------------: |
+| LR        | 7.65                      | 0.61                        | 1.11            |
+| DIN       | -                         | -                           | -               |
+| DSSM      | -                         | -                           | -               |
+| FM        | 9.77                      | 0.73                        | 1.45            |
+| DeepFM    | 13.64                     | 0.83                        | 1.72            |
+| Wide&Deep | 13.58                     | 0.80                        | 1.72            |
+| NFM       | 13.36                     | 0.75                        | 1.72            |
+| AFM       | 19.40                     | 1.02                        | 2.34            |
+| AutoInt   | 19.40                     | 0.98                        | 2.06            |
+| DCN       | 16.25                     | 0.78                        | 1.67            |
+| FNN(DNN)  | 10.03                     | 0.64                        | 1.63            |
+| PNN       | 12.92                     | 0.72                        | 1.85            |
+| FFM       | -                         | -                           | Out of Memory               |
+| FwFM      | 1175.24                   | 8.90                        | 2.12            |
+| xDeepFM   | 32.27                     | 1.34                        | 2.25            |
+
+Note: Criteo dataset is not suitable for DIN model and DSSM model.
+#### Config file of Criteo dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: ~
+ITEM_ID_FIELD: ~
+LABEL_FIELD: label
+
+load_col: 
+    inter: '*'
+
+highest_val:
+    index: 2292530
+
+fill_nan: True
+normalize_all: True
+min_item_inter_num: 0
+min_user_inter_num: 0
+
+drop_filter_field : True
+
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+eval_setting: RO_RS
+group_by_user: False
+valid_metric: AUC
+metrics: ['AUC', 'LogLoss']
+```
+
+Other parameters (including model parameters) are default value. 
+
+### 3）Avazu dataset:
+
+#### Time and memory cost on Avazu dataset:
+
+| Method    | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
+| --------- | -------------------------: | ---------------------------: | ---------------: |
+| LR        | 9.30                      | 0.76                        | 1.42            |
+| DIN       | -                         | -                           | -               |
+| DSSM      | -                         | -                           | -               |
+| FM        | 25.68                     | 0.94                        | 2.60            |
+| DeepFM    | 28.41                     | 1.19                        | 2.66            |
+| Wide&Deep | 27.58                     | 0.97                        | 2.66            |
+| NFM       | 30.46                     | 1.06                        | 2.66            |
+| AFM       | 31.03                     | 1.06                        | 2.69            |
+| AutoInt   | 38.11                     | 1.41                        | 2.84            |
+| DCN       | 30.78                     | 0.96                        | 2.64            |
+| FNN(DNN)  | 23.53                     | 0.84                        | 2.60            |
+| PNN       | 25.86                     | 0.90                        | 2.68            |
+| FFM       | -                         | -                           | Out of Memory               |
+| FwFM      | 336.75                    | 7.49                        | 2.63            |
+| xDeepFM   | 54.88                     | 1.45                        | 2.89            |
+
+Note: Avazu dataset is not suitable for DIN model and DSSM model.
+#### Config file of Avazu dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: ~
+ITEM_ID_FIELD: ~
+LABEL_FIELD: label
+fill_nan: True
+normalize_all: True
+
+load_col:
+    inter: '*'
+    
+lowest_val:
+  timestamp: 14102931
+drop_filter_field : False
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+eval_setting: RO_RS
+group_by_user: False
+valid_metric: AUC
+metrics: ['AUC', 'LogLoss']
+```
+
+Other parameters (including model parameters) are default value. 
+
+
+
+
+
+
+
diff --git a/asset/time_test_result/General_recommendation.md b/asset/time_test_result/General_recommendation.md
index e88472078..a9442c57a 100644
--- a/asset/time_test_result/General_recommendation.md
+++ b/asset/time_test_result/General_recommendation.md
@@ -77,7 +77,7 @@ Other parameters (including model parameters) are default value.
 | BPRMF      | 4.42              | 52.81             | 1.08    |
 | NeuMF      | 11.33             | 238.92            | 1.26     |
 | DMF        | 20.62             | 68.89             | 7.12     |
-| NAIS       | -                 | -                 | -           |
+| NAIS       | -                 | -                 | Out of Memory       |
 | NGCF       | 52.50             | 51.60             | 2.00     |
 | GCMC       | 93.15             |                     1810.43 | 3.17     |
 | LightGCN   | 30.21             | 47.12             | 1.58     |
@@ -127,13 +127,13 @@ Other parameters (including model parameters) are default value.
 | BPRMF      | 6.31                      | 120.03                    | 1.29            |
 | NeuMF      | 17.38                     | 2069.53                   | 1.67            |
 | DMF        | 43.96                     | 173.13                    | 9.22            |
-| NAIS       | -                         | -                         | -               |
+| NAIS       | -                         | -                         | Out of Memory               |
 | NGCF       | 122.90                    | 129.59                    | 3.28            |
 | GCMC       | 299.36                    | 9833.24                   | 5.96            |
 | LightGCN   | 67.91                     | 116.16                    | 2.02            |
 | DGCF       | 1542.00                   | 119.00                    | 17.17           |
 | ConvNCF    | 87.56                     | 11155.31                  | 1.62            |
-| FISM       | -                         | -                         | -               |
+| FISM       | -                         | -                         | Out of Memory     |
 | SpectralCF | 138.99                    | 133.37                    | 3.10            |
 
 #### Config file of Yelp dataset:
diff --git a/asset/time_test_result/Sequential_recommendation.md b/asset/time_test_result/Sequential_recommendation.md
index 3a0fb4f6c..38290d494 100644
--- a/asset/time_test_result/Sequential_recommendation.md
+++ b/asset/time_test_result/Sequential_recommendation.md
@@ -1,225 +1,230 @@
-## Time and memory cost of sequential recommendation models 
-
-### Datasets information:
-
-| Dataset    | #User   | #Item  | #Interaction | Sparsity |
-| ---------- | -------: | ------: | ------------: | --------: |
-| ml-1m      | 6,041   | 3,707  | 1,000,209    | 0.9553   |
-| DIGINETICA | 59,425  | 42,116 | 547,416      | 0.9998   |
-| Yelp       | 102,046 | 98,408 | 2,903,648    | 0.9997   |
-
-### Device information
-
-```
-OS:                   Linux
-Python Version:       3.8.3
-PyTorch Version:      1.7.0
-cudatoolkit Version:  10.1
-GPU:                  TITAN RTX（24GB）
-Machine Specs:        32 CPU machine, 64GB RAM
-```
-
-### 1) ml-1m dataset:
-
-#### Time and memory cost on ml-1m dataset:
-
-| Method           | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) |
-| ---------------- | -----------------: | -----------------: | -----------: |
-| Improved GRU-Rec | 7.78              | 0.11              | 1.27     |
-| SASRec           | 17.78             | 0.12              | 1.84     |
-| NARM             | 8.29              | 0.11              | 1.29     |
-| FPMC             | 7.51              | 0.11              | 1.18     |
-| STAMP            | 7.32              | 0.11              | 1.20     |
-| Caser            | 44.85             | 0.12              | 1.14     |
-| NextItNet        |                  16433.27 |                     96.31 |            1.86 |
-| TransRec         | 10.08             | 0.16              | 8.18     |
-| S3Rec            | - | - | -       |
-| GRU4RecF         | 10.20             | 0.15              | 1.80     |
-| SASRecF          | 18.84             | 0.17              | 1.78    |
-| BERT4Rec         | 36.09             | 0.34              | 1.97    |
-| FDSA             | 31.86             | 0.19              | 2.32     |
-| SRGNN            | 327.38            | 2.19              | 1.21     |
-| GCSAN            | 335.27            | 0.02             | 1.58     |
-| KSR              | - | - | - |
-| GRU4RecKG        | - | - | - |
-
-#### Config file of ml-1m dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: user_id
-ITEM_ID_FIELD: item_id
-TIME_FIELD: timestamp
-NEG_PREFIX: neg_
-ITEM_LIST_LENGTH_FIELD: item_length
-LIST_SUFFIX: _list
-MAX_ITEM_LIST_LENGTH: 20
-POSITION_FIELD: position_id
-load_col:
-  inter: [user_id, item_id, timestamp]
-min_user_inter_num: 0
-min_item_inter_num: 0
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-valid_metric: MRR@10
-eval_setting: TO_LS,full
-training_neg_sample_num: 0
-```
-
-Other parameters (including model parameters) are default value. 
-
-**NOTE :** 
-
-1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
-
-2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
-
-```
-load_col:
-  inter: [user_id, item_id, timestamp]
-  item: [item_id, genre]
-```
-
-### 2）DIGINETICA dataset:
-
-#### Time and memory cost on DIGINETICA dataset:
-
-| Method           | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) |
-| ---------------- | -----------------: | -----------------: | -----------: |
-| Improved GRU-Rec | 4.10              | 1.05              | 4.02     |
-| SASRec           | 8.36              | 1.21              | 4.43     |
-| NARM             | 4.30              | 1.08              | 4.09     |
-| FPMC             | 2.98              | 1.08              | 4.08     |
-| STAMP            | 4.27              | 1.04              | 3.88     |
-| Caser            | 17.15             | 1.18              | 3.94    |
-| NextItNet        | - | - | - |
-| TransRec         | -                 | -                 | -           |
-| S3Rec            | - | - | - |
-| GRU4RecF         | 4.79              | 1.17              | 4.83     |
-| SASRecF          | 8.66              | 1.29              | 5.11     |
-| BERT4Rec         | 16.80             | 3.54              | 7.97    |
-| FDSA             | 13.44             | 1.47              | 5.66     |
-| SRGNN            | 88.59             | 15.37             | 4.01     |
-| GCSAN            | 96.69             | 17.11             | 4.25     |
-| KSR              | - | - | - |
-| GRU4RecKG        | - | - | - |
-
-#### Config file of DIGINETICA dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: session_id
-ITEM_ID_FIELD: item_id
-TIME_FIELD: timestamp
-NEG_PREFIX: neg_
-ITEM_LIST_LENGTH_FIELD: item_length
-LIST_SUFFIX: _list
-MAX_ITEM_LIST_LENGTH: 20
-POSITION_FIELD: position_id
-load_col:
-  inter: [session_id, item_id, timestamp]
-min_user_inter_num: 6
-min_item_inter_num: 1
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-valid_metric: MRR@10
-eval_setting: TO_LS,full
-training_neg_sample_num: 0
-```
-
-Other parameters (including model parameters) are default value. 
-
-**NOTE :** 
-
-1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
-
-2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
-
-```
-load_col:
-   inter: [session_id, item_id, timestamp]
-   item: [item_id, item_category]
-```
-
-### 3）Yelp dataset:
-
-#### Time and memory cost on Yelp dataset:
-
-| Method           | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
-| ---------------- | -----------------: | -----------------: | -----------: |
-| Improved GRU-Rec | 44.31             | 2.74              | 7.92        |
-| SASRec           | 75.51             | 3.11              | 8.32        |
-| NARM             | 45.65             | 2.76              | 7.98        |
-| FPMC             | 21.05             | 3.05              | 8.22        |
-| STAMP            | 42.08             | 2.72              | 7.77        |
-| Caser            | 147.15            | 2.89              | 7.87        |
-| NextItNet        |                  45019.38 |                     1670.76 |            8.44 |
-| TransRec         | -                 | -                 | -           |
-| S3Rec            | -                 | -                 | -           |
-| GRU4RecF         | -                 | -                 | -           |
-| SASRecF          | -                 | -                 | -           |
-| BERT4Rec         | 193.74            | 8.43              | 16.57       |
-| FDSA             | -                 | -                 | -           |
-| SRGNN            | 825.11            | 33.20             | 7.90        |
-| GCSAN            | 837.23            | 33.00             | 8.14        |
-| KSR              | - | - | - |
-| GRU4RecKG        | - | - | - |
-
-#### Config file of DIGINETICA dataset:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: session_id
-ITEM_ID_FIELD: item_id
-TIME_FIELD: timestamp
-NEG_PREFIX: neg_
-ITEM_LIST_LENGTH_FIELD: item_length
-LIST_SUFFIX: _list
-MAX_ITEM_LIST_LENGTH: 20
-POSITION_FIELD: position_id
-load_col:
-  inter: [session_id, item_id, timestamp]
-min_user_inter_num: 6
-min_item_inter_num: 1
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-valid_metric: MRR@10
-eval_setting: TO_LS,full
-training_neg_sample_num: 0
-```
-
-Other parameters (including model parameters) are default value. 
-
-**NOTE :** 
-
-1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
-
-2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
-
-```
-load_col:
-    inter: [session_id, item_id, timestamp]
- 	item: [item_id, item_category]
-```
-
-
-
-
-
-
-
+## Time and memory cost of sequential recommendation models 
+
+### Datasets information:
+
+| Dataset    | #User   | #Item  | #Interaction | Sparsity |
+| ---------- | -------: | ------: | ------------: | --------: |
+| ml-1m      | 6,041   | 3,707  | 1,000,209    | 0.9553   |
+| DIGINETICA | 59,425  | 42,116 | 547,416      | 0.9998   |
+| Yelp       | 102,046 | 98,408 | 2,903,648    | 0.9997   |
+
+### Device information
+
+```
+OS:                   Linux
+Python Version:       3.8.3
+PyTorch Version:      1.7.0
+cudatoolkit Version:  10.1
+GPU:                  TITAN RTX（24GB）
+Machine Specs:        32 CPU machine, 64GB RAM
+```
+
+### 1) ml-1m dataset:
+
+#### Time and memory cost on ml-1m dataset:
+
+| Method           | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) |
+| ---------------- | -----------------: | -----------------: | -----------: |
+| Improved GRU-Rec | 7.78              | 0.11              | 1.27     |
+| SASRec           | 17.78             | 0.12              | 1.84     |
+| NARM             | 8.29              | 0.11              | 1.29     |
+| FPMC             | 7.51              | 0.11              | 1.18     |
+| STAMP            | 7.32              | 0.11              | 1.20     |
+| Caser            | 44.85             | 0.12              | 1.14     |
+| NextItNet        |                  16433.27 |                     96.31 |            1.86 |
+| TransRec         | 10.08             | 0.16              | 8.18     |
+| S3Rec            | - | - | -       |
+| GRU4RecF         | 10.20             | 0.15              | 1.80     |
+| SASRecF          | 18.84             | 0.17              | 1.78    |
+| BERT4Rec         | 36.09             | 0.34              | 1.97    |
+| FDSA             | 31.86             | 0.19              | 2.32     |
+| SRGNN            | 327.38            | 2.19              | 1.21     |
+| GCSAN            | 335.27            | 0.02             | 1.58     |
+| KSR              | - | - | - |
+| GRU4RecKG        | - | - | - |
+
+#### Config file of ml-1m dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: user_id
+ITEM_ID_FIELD: item_id
+TIME_FIELD: timestamp
+NEG_PREFIX: neg_
+ITEM_LIST_LENGTH_FIELD: item_length
+LIST_SUFFIX: _list
+MAX_ITEM_LIST_LENGTH: 20
+POSITION_FIELD: position_id
+load_col:
+  inter: [user_id, item_id, timestamp]
+min_user_inter_num: 0
+min_item_inter_num: 0
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+valid_metric: MRR@10
+eval_setting: TO_LS,full
+training_neg_sample_num: 0
+```
+
+Other parameters (including model parameters) are default value. 
+
+**NOTE :** 
+
+1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
+
+2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
+
+```
+load_col:
+  inter: [user_id, item_id, timestamp]
+  item: [item_id, genre]
+```
+
+### 2）DIGINETICA dataset:
+
+#### Time and memory cost on DIGINETICA dataset:
+
+| Method           | Training Time (sec/epoch) | Evaluate Time (sec/epoch) | GPU Memory (GB) |
+| ---------------- | -----------------: | -----------------: | -----------: |
+| Improved GRU-Rec | 4.10              | 1.05              | 4.02     |
+| SASRec           | 8.36              | 1.21              | 4.43     |
+| NARM             | 4.30              | 1.08              | 4.09     |
+| FPMC             | 2.98              | 1.08              | 4.08     |
+| STAMP            | 4.27              | 1.04              | 3.88     |
+| Caser            | 17.15             | 1.18              | 3.94    |
+| NextItNet        | 6150.49 | 947.66 | 4.54 |
+| TransRec         | -                 | -                 | Out of Memory           |
+| S3Rec            | - | - | - |
+| GRU4RecF         | 4.79              | 1.17              | 4.83     |
+| SASRecF          | 8.66              | 1.29              | 5.11     |
+| BERT4Rec         | 16.80             | 3.54              | 7.97    |
+| FDSA             | 13.44             | 1.47              | 5.66     |
+| SRGNN            | 88.59             | 15.37             | 4.01     |
+| GCSAN            | 96.69             | 17.11             | 4.25     |
+| KSR              | - | - | - |
+| GRU4RecKG        | - | - | - |
+
+#### Config file of DIGINETICA dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: session_id
+ITEM_ID_FIELD: item_id
+TIME_FIELD: timestamp
+NEG_PREFIX: neg_
+ITEM_LIST_LENGTH_FIELD: item_length
+LIST_SUFFIX: _list
+MAX_ITEM_LIST_LENGTH: 20
+POSITION_FIELD: position_id
+load_col:
+  inter: [session_id, item_id, timestamp]
+min_user_inter_num: 6
+min_item_inter_num: 1
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+valid_metric: MRR@10
+eval_setting: TO_LS,full
+training_neg_sample_num: 0
+```
+
+Other parameters (including model parameters) are default value. 
+
+**NOTE :** 
+
+1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
+
+2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
+
+```
+load_col:
+   inter: [session_id, item_id, timestamp]
+   item: [item_id, item_category]
+```
+
+### 3）Yelp dataset:
+
+#### Time and memory cost on Yelp dataset:
+
+| Method           | Training Time (sec/epoch) | Evaluation Time (sec/epoch) | GPU Memory (GB) |
+| ---------------- | -----------------: | -----------------: | -----------: |
+| Improved GRU-Rec | 44.31             | 2.74              | 7.92        |
+| SASRec           | 75.51             | 3.11              | 8.32        |
+| NARM             | 45.65             | 2.76              | 7.98        |
+| FPMC             | 21.05             | 3.05              | 8.22        |
+| STAMP            | 42.08             | 2.72              | 7.77        |
+| Caser            | 147.15            | 2.89              | 7.87        |
+| NextItNet        |                  45019.38 |                     1670.76 |            8.44 |
+| TransRec         | -                 | -                 | Out of Memory           |
+| S3Rec            | -                 | -                 | -           |
+| GRU4RecF         | -                 | -                 | Out of Memory           |
+| SASRecF          | -                 | -                 | Out of Memory           |
+| BERT4Rec         | 193.74            | 8.43              | 16.57       |
+| FDSA             | -                 | -                 | Out of Memory           |
+| SRGNN            | 825.11            | 33.20             | 7.90        |
+| GCSAN            | 837.23            | 33.00             | 8.14        |
+| KSR              | - | - | - |
+| GRU4RecKG        | - | - | - |
+
+#### Config file of Yelp dataset:
+
+```
+# dataset config
+field_separator: "\t"
+seq_separator: " "
+USER_ID_FIELD: user_id
+ITEM_ID_FIELD: business_id
+RATING_FIELD: stars
+TIME_FIELD: date
+NEG_PREFIX: neg_
+ITEM_LIST_LENGTH_FIELD: item_length
+LIST_SUFFIX: _list
+MAX_ITEM_LIST_LENGTH: 20
+POSITION_FIELD: position_id
+load_col:
+  inter: [user_id, business_id, stars, date]
+min_user_inter_num: 10
+min_item_inter_num: 4
+lowest_val:
+  stars: 3
+drop_filter_field: True
+
+# training and evaluation
+epochs: 500
+train_batch_size: 2048
+eval_batch_size: 2048
+valid_metric: MRR@10
+eval_setting: TO_LS,full
+training_neg_sample_num: 0
+
+```
+
+Other parameters (including model parameters) are default value. 
+
+**NOTE :** 
+
+1) For FPMC and TransRec model,  `training_neg_sample_num`  should be  `1` . 
+
+2) For  SASRecF, GRU4RecF and FDSA,   `load_col` should as below:
+
+```
+load_col:
+    inter: [session_id, item_id, timestamp]
+ 	item: [item_id, item_category]
+```
+
+
+
+
+
+
+
diff --git a/conda/meta.yaml b/conda/meta.yaml
index fa8d9677d..edef48fb9 100644
--- a/conda/meta.yaml
+++ b/conda/meta.yaml
@@ -1,6 +1,6 @@
 package:
   name: recbole
-  version: 0.1.1
+  version: 0.2.0
 
 source:
   path: ../
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 000000000..d0c3cbf10
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/source/asset/afm.jpg b/docs/source/asset/afm.jpg
new file mode 100644
index 000000000..61c072a0b
Binary files /dev/null and b/docs/source/asset/afm.jpg differ
diff --git a/docs/source/asset/autoint.png b/docs/source/asset/autoint.png
new file mode 100644
index 000000000..41d242649
Binary files /dev/null and b/docs/source/asset/autoint.png differ
diff --git a/docs/source/asset/bert4rec.png b/docs/source/asset/bert4rec.png
new file mode 100644
index 000000000..adc826e22
Binary files /dev/null and b/docs/source/asset/bert4rec.png differ
diff --git a/docs/source/asset/bpr.png b/docs/source/asset/bpr.png
new file mode 100644
index 000000000..f48cb41a1
Binary files /dev/null and b/docs/source/asset/bpr.png differ
diff --git a/docs/source/asset/caser.png b/docs/source/asset/caser.png
new file mode 100644
index 000000000..d09eb29ae
Binary files /dev/null and b/docs/source/asset/caser.png differ
diff --git a/docs/source/asset/cdae.png b/docs/source/asset/cdae.png
new file mode 100644
index 000000000..99859aeed
Binary files /dev/null and b/docs/source/asset/cdae.png differ
diff --git a/docs/source/asset/cke.png b/docs/source/asset/cke.png
new file mode 100644
index 000000000..b5092f70f
Binary files /dev/null and b/docs/source/asset/cke.png differ
diff --git a/docs/source/asset/convncf.png b/docs/source/asset/convncf.png
new file mode 100644
index 000000000..3725ba15e
Binary files /dev/null and b/docs/source/asset/convncf.png differ
diff --git a/docs/source/asset/data_flow_en.png b/docs/source/asset/data_flow_en.png
new file mode 100644
index 000000000..cf13cfa93
Binary files /dev/null and b/docs/source/asset/data_flow_en.png differ
diff --git a/docs/source/asset/dcn.png b/docs/source/asset/dcn.png
new file mode 100644
index 000000000..9d2faa85b
Binary files /dev/null and b/docs/source/asset/dcn.png differ
diff --git a/docs/source/asset/deepfm.png b/docs/source/asset/deepfm.png
new file mode 100644
index 000000000..03978c41c
Binary files /dev/null and b/docs/source/asset/deepfm.png differ
diff --git a/docs/source/asset/dgcf.jpg b/docs/source/asset/dgcf.jpg
new file mode 100644
index 000000000..a3685742b
Binary files /dev/null and b/docs/source/asset/dgcf.jpg differ
diff --git a/docs/source/asset/din.png b/docs/source/asset/din.png
new file mode 100644
index 000000000..a0869747a
Binary files /dev/null and b/docs/source/asset/din.png differ
diff --git a/docs/source/asset/dmf.jpg b/docs/source/asset/dmf.jpg
new file mode 100644
index 000000000..75840cd06
Binary files /dev/null and b/docs/source/asset/dmf.jpg differ
diff --git a/docs/source/asset/dssm.png b/docs/source/asset/dssm.png
new file mode 100644
index 000000000..9b1def0ec
Binary files /dev/null and b/docs/source/asset/dssm.png differ
diff --git a/docs/source/asset/enmf.jpg b/docs/source/asset/enmf.jpg
new file mode 100644
index 000000000..be094d09f
Binary files /dev/null and b/docs/source/asset/enmf.jpg differ
diff --git a/docs/source/asset/evaluation.png b/docs/source/asset/evaluation.png
new file mode 100644
index 000000000..d2a468f4f
Binary files /dev/null and b/docs/source/asset/evaluation.png differ
diff --git a/docs/source/asset/fdsa.png b/docs/source/asset/fdsa.png
new file mode 100644
index 000000000..55af42a45
Binary files /dev/null and b/docs/source/asset/fdsa.png differ
diff --git a/docs/source/asset/ffm.png b/docs/source/asset/ffm.png
new file mode 100644
index 000000000..fd12c2295
Binary files /dev/null and b/docs/source/asset/ffm.png differ
diff --git a/docs/source/asset/fm.png b/docs/source/asset/fm.png
new file mode 100644
index 000000000..4702eb61e
Binary files /dev/null and b/docs/source/asset/fm.png differ
diff --git a/docs/source/asset/fnn.png b/docs/source/asset/fnn.png
new file mode 100644
index 000000000..edd1c7f1b
Binary files /dev/null and b/docs/source/asset/fnn.png differ
diff --git a/docs/source/asset/fossil.jpg b/docs/source/asset/fossil.jpg
new file mode 100644
index 000000000..eb14b22b4
Binary files /dev/null and b/docs/source/asset/fossil.jpg differ
diff --git a/docs/source/asset/fpmc.png b/docs/source/asset/fpmc.png
new file mode 100644
index 000000000..f339cc2c3
Binary files /dev/null and b/docs/source/asset/fpmc.png differ
diff --git a/docs/source/asset/fwfm.png b/docs/source/asset/fwfm.png
new file mode 100644
index 000000000..01282fe59
Binary files /dev/null and b/docs/source/asset/fwfm.png differ
diff --git a/docs/source/asset/gcmc.png b/docs/source/asset/gcmc.png
new file mode 100644
index 000000000..99acffab4
Binary files /dev/null and b/docs/source/asset/gcmc.png differ
diff --git a/docs/source/asset/gcsan.png b/docs/source/asset/gcsan.png
new file mode 100644
index 000000000..0ff336b4f
Binary files /dev/null and b/docs/source/asset/gcsan.png differ
diff --git a/docs/source/asset/gru4rec.png b/docs/source/asset/gru4rec.png
new file mode 100644
index 000000000..c5ef04d29
Binary files /dev/null and b/docs/source/asset/gru4rec.png differ
diff --git a/docs/source/asset/gru4recf.png b/docs/source/asset/gru4recf.png
new file mode 100644
index 000000000..b1bf6611e
Binary files /dev/null and b/docs/source/asset/gru4recf.png differ
diff --git a/docs/source/asset/hgn.jpg b/docs/source/asset/hgn.jpg
new file mode 100644
index 000000000..9200699b5
Binary files /dev/null and b/docs/source/asset/hgn.jpg differ
diff --git a/docs/source/asset/hrm.jpg b/docs/source/asset/hrm.jpg
new file mode 100644
index 000000000..5ba634de6
Binary files /dev/null and b/docs/source/asset/hrm.jpg differ
diff --git a/docs/source/asset/kgat.png b/docs/source/asset/kgat.png
new file mode 100644
index 000000000..84d0622d0
Binary files /dev/null and b/docs/source/asset/kgat.png differ
diff --git a/docs/source/asset/kgcn.png b/docs/source/asset/kgcn.png
new file mode 100644
index 000000000..86c9040b9
Binary files /dev/null and b/docs/source/asset/kgcn.png differ
diff --git a/docs/source/asset/kgnnls.png b/docs/source/asset/kgnnls.png
new file mode 100644
index 000000000..664ce86e7
Binary files /dev/null and b/docs/source/asset/kgnnls.png differ
diff --git a/docs/source/asset/ksr.jpg b/docs/source/asset/ksr.jpg
new file mode 100644
index 000000000..6c764c642
Binary files /dev/null and b/docs/source/asset/ksr.jpg differ
diff --git a/docs/source/asset/ktup.png b/docs/source/asset/ktup.png
new file mode 100644
index 000000000..0bde6eadb
Binary files /dev/null and b/docs/source/asset/ktup.png differ
diff --git a/docs/source/asset/lightgcn.png b/docs/source/asset/lightgcn.png
new file mode 100644
index 000000000..576f91130
Binary files /dev/null and b/docs/source/asset/lightgcn.png differ
diff --git a/docs/source/asset/line.png b/docs/source/asset/line.png
new file mode 100644
index 000000000..f9e09e9b2
Binary files /dev/null and b/docs/source/asset/line.png differ
diff --git a/docs/source/asset/lr.png b/docs/source/asset/lr.png
new file mode 100644
index 000000000..c2d6189a7
Binary files /dev/null and b/docs/source/asset/lr.png differ
diff --git a/docs/source/asset/macridvae.png b/docs/source/asset/macridvae.png
new file mode 100644
index 000000000..f45826c1b
Binary files /dev/null and b/docs/source/asset/macridvae.png differ
diff --git a/docs/source/asset/mkr.png b/docs/source/asset/mkr.png
new file mode 100644
index 000000000..f37307188
Binary files /dev/null and b/docs/source/asset/mkr.png differ
diff --git a/docs/source/asset/multidae.png b/docs/source/asset/multidae.png
new file mode 100644
index 000000000..919bc67d6
Binary files /dev/null and b/docs/source/asset/multidae.png differ
diff --git a/docs/source/asset/multivae.png b/docs/source/asset/multivae.png
new file mode 100644
index 000000000..919bc67d6
Binary files /dev/null and b/docs/source/asset/multivae.png differ
diff --git a/docs/source/asset/nais.png b/docs/source/asset/nais.png
new file mode 100644
index 000000000..6bd404472
Binary files /dev/null and b/docs/source/asset/nais.png differ
diff --git a/docs/source/asset/narm.png b/docs/source/asset/narm.png
new file mode 100644
index 000000000..a51a52e54
Binary files /dev/null and b/docs/source/asset/narm.png differ
diff --git a/docs/source/asset/neumf.png b/docs/source/asset/neumf.png
new file mode 100644
index 000000000..5af976a5d
Binary files /dev/null and b/docs/source/asset/neumf.png differ
diff --git a/docs/source/asset/nextitnet.png b/docs/source/asset/nextitnet.png
new file mode 100644
index 000000000..aa751d33a
Binary files /dev/null and b/docs/source/asset/nextitnet.png differ
diff --git a/docs/source/asset/nfm.jpg b/docs/source/asset/nfm.jpg
new file mode 100644
index 000000000..c242cd794
Binary files /dev/null and b/docs/source/asset/nfm.jpg differ
diff --git a/docs/source/asset/ngcf.jpg b/docs/source/asset/ngcf.jpg
new file mode 100644
index 000000000..eae87d17f
Binary files /dev/null and b/docs/source/asset/ngcf.jpg differ
diff --git a/docs/source/asset/nncf.png b/docs/source/asset/nncf.png
new file mode 100644
index 000000000..7655c3d94
Binary files /dev/null and b/docs/source/asset/nncf.png differ
diff --git a/docs/source/asset/npe.jpg b/docs/source/asset/npe.jpg
new file mode 100644
index 000000000..2bedff1d1
Binary files /dev/null and b/docs/source/asset/npe.jpg differ
diff --git a/docs/source/asset/pnn.jpg b/docs/source/asset/pnn.jpg
new file mode 100644
index 000000000..1c8b475a4
Binary files /dev/null and b/docs/source/asset/pnn.jpg differ
diff --git a/docs/source/asset/repeatnet.jpg b/docs/source/asset/repeatnet.jpg
new file mode 100644
index 000000000..e63af5f71
Binary files /dev/null and b/docs/source/asset/repeatnet.jpg differ
diff --git a/docs/source/asset/ripplenet.jpg b/docs/source/asset/ripplenet.jpg
new file mode 100644
index 000000000..21f059094
Binary files /dev/null and b/docs/source/asset/ripplenet.jpg differ
diff --git a/docs/source/asset/s3rec.png b/docs/source/asset/s3rec.png
new file mode 100644
index 000000000..2b80cd58d
Binary files /dev/null and b/docs/source/asset/s3rec.png differ
diff --git a/docs/source/asset/sasrec.png b/docs/source/asset/sasrec.png
new file mode 100644
index 000000000..e317bd7d5
Binary files /dev/null and b/docs/source/asset/sasrec.png differ
diff --git a/docs/source/asset/shan.jpg b/docs/source/asset/shan.jpg
new file mode 100644
index 000000000..bb681b387
Binary files /dev/null and b/docs/source/asset/shan.jpg differ
diff --git a/docs/source/asset/spectralcf.png b/docs/source/asset/spectralcf.png
new file mode 100644
index 000000000..d1e5f59cb
Binary files /dev/null and b/docs/source/asset/spectralcf.png differ
diff --git a/docs/source/asset/srgnn.png b/docs/source/asset/srgnn.png
new file mode 100644
index 000000000..51f7d2935
Binary files /dev/null and b/docs/source/asset/srgnn.png differ
diff --git a/docs/source/asset/stamp.png b/docs/source/asset/stamp.png
new file mode 100644
index 000000000..32e0a8a40
Binary files /dev/null and b/docs/source/asset/stamp.png differ
diff --git a/docs/source/asset/transrec.png b/docs/source/asset/transrec.png
new file mode 100644
index 000000000..cc87d45f9
Binary files /dev/null and b/docs/source/asset/transrec.png differ
diff --git a/docs/source/asset/widedeep.png b/docs/source/asset/widedeep.png
new file mode 100644
index 000000000..76a61d449
Binary files /dev/null and b/docs/source/asset/widedeep.png differ
diff --git a/docs/source/asset/xdeepfm.png b/docs/source/asset/xdeepfm.png
new file mode 100644
index 000000000..6a92b6431
Binary files /dev/null and b/docs/source/asset/xdeepfm.png differ
diff --git a/docs/source/conf.py b/docs/source/conf.py
new file mode 100644
index 000000000..befbeb492
--- /dev/null
+++ b/docs/source/conf.py
@@ -0,0 +1,74 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import sphinx_rtd_theme
+import os
+import sys
+sys.path.insert(0, os.path.abspath('../..'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'RecBole'
+copyright = '2020, RecBole Contributors'
+author = 'AIBox RecBole group'
+
+# The full version, including alpha/beta/rc tags
+release = '0.2.0'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+    'sphinx_copybutton',
+]
+
+autodoc_mock_imports = ["pandas", "pyecharts"]
+# autoclass_content = 'both'
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = 'en'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'alabaster'
+
+
+html_theme = 'sphinx_rtd_theme'
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
diff --git a/docs/source/developer_guide/customize_dataloaders.rst b/docs/source/developer_guide/customize_dataloaders.rst
new file mode 100644
index 000000000..6565dcedf
--- /dev/null
+++ b/docs/source/developer_guide/customize_dataloaders.rst
@@ -0,0 +1,201 @@
+Customize DataLoaders
+======================
+Here, we present how to develop a new DataLoader, and apply it into our tool. If we have a new model,
+and there is no special requirement for loading the data, then we need to design a new DataLoader.
+
+
+Abstract DataLoader
+--------------------------
+In this project, there are three abstracts: :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
+:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`, :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin`.
+
+In general, the new dataloader should inherit from the above three abstract classes.
+If one only needs to modify existing DataLoader, you can also inherit from the it.
+The documentation of dataloader: :doc:`../../recbole/recbole.data.dataloader`
+
+
+AbstractDataLoader
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader` is the most basic abstract class,
+which includes three functions: :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end`,
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle`
+and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`.
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end` is the max
+:attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr` plus 1.
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle` is leverage to permute the dataset,
+which will be invoked by :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__iter__`
+if the parameter :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` is True.
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data` is used to
+load the next batch data, and return the :class:`~recbole.data.interaction.Interaction` format,
+which will be invoked in :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__next__`.
+
+In :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
+there are two functions to assist the conversion of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`,
+one is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dataframe_to_interaction`,
+and the other is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dict_to_interaction`.
+They both use the functions with the same name in :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.dataset`.
+The :class:`pandas.DataFrame` or :class:`dict` is converted into :class:`~recbole.data.interaction.Interaction`.
+
+In addition to the above three functions, two other functions can also be rewrite,
+that is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup`
+and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess`.
+
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` is used to tackle the problems except initializing the parameters.
+For example, reset the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.batch_size`,
+examine the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` setting.
+All these things can be rewritten in the subclass.
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is used to process the data,
+e.g., negative sampling.
+
+At the end of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__init__`,
+:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` will be invoked,
+and then if :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.real_time` is ``True``,
+then :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is recalled.
+
+NegSampleMixin
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin` inherent from
+:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`, which is used for negative sampling.
+It has three additional functions upon its father class:
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling`
+and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list`.
+
+Since the positive and negative samples should be framed in the same batch,
+the original batch size can be not appropriate.
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation` is used to reset the batch size,
+such that the positive and negative samples can be in the same batch.
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling` is used for negative sampling,
+which should be implemented by the subclass.
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list` returns the positive sample number for each user.
+
+In addition, :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup`
+and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` are also changed.
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup` will
+call :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` is used for negative sampling
+which should be implemented in the subclass.
+
+NegSampleByMixin
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin` inherent
+from :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`,
+which is used for negative sampling by ratio.
+It supports two strategies, the first one is ``pair-wise sampling``, the other is ``point-wise sampling``.
+Then based on the parent class, two functions are added:
+:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_pair_wise_sampling`
+and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_point_wise_sampling`.
+
+
+Example
+--------------------------
+Here, we take :class:`~recbole.data.dataloader.user_dataloader.UserDataLoader` as the example,
+this dataloader returns user id, which is leveraged to train the user representations.
+
+
+Implement __init__()
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+:meth:`__init__` can be used to initialize some of the necessary parameters.
+Here, we just need to record :attr:`uid_field`.
+
+.. code:: python
+
+    def __init__(self, config, dataset,
+                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+        self.uid_field = dataset.uid_field
+
+        super().__init__(config=config, dataset=dataset,
+                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+
+Implement setup()
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Because of some training requirement, :attr:`self.shuffle` should be true.
+Then we can check and revise :attr:`self.shuffle` in :meth:`~recbole.data.dataloader.user_dataloader.setup`.
+
+
+.. code:: python
+
+    def setup(self):
+        if self.shuffle is False:
+            self.shuffle = True
+            self.logger.warning('UserDataLoader must shuffle the data')
+
+Implement pr_end() and _shuffle()
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Since this dataloader only returns user id, these function can be implemented readily.
+
+.. code:: python
+
+    @property
+    def pr_end(self):
+        return len(self.dataset.user_feat)
+
+    def _shuffle(self):
+        self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)
+
+Implement _next_batch_data
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+This function only require return user id from :attr:`user_feat`,
+we only have to select one column, and use :meth:`_dataframe_to_interaction` to convert
+:class:`pandas.DataFrame` into :class:`~recbole.data.interaction.Interaction`.
+
+
+.. code:: python
+
+    def _next_batch_data(self):
+        cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
+        self.pr += self.step
+        return self._dataframe_to_interaction(cur_data)
+
+
+Complete Code
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    class UserDataLoader(AbstractDataLoader):
+        """:class:`UserDataLoader` will return a batch of data which only contains user-id when it is iterated.
+
+        Args:
+            config (Config): The config of dataloader.
+            dataset (Dataset): The dataset of dataloader.
+            batch_size (int, optional): The batch_size of dataloader. Defaults to ``1``.
+            dl_format (InputType, optional): The input type of dataloader. Defaults to
+                :obj:`~recbole.utils.enum_type.InputType.POINTWISE`.
+            shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.
+
+        Attributes:
+            shuffle (bool): Whether the dataloader will be shuffle after a round.
+                However, in :class:`UserDataLoader`, it's guaranteed to be ``True``.
+        """
+        dl_type = DataLoaderType.ORIGIN
+
+        def __init__(self, config, dataset,
+                     batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+            self.uid_field = dataset.uid_field
+
+            super().__init__(config=config, dataset=dataset,
+                             batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+
+        def setup(self):
+            """Make sure that the :attr:`shuffle` is True. If :attr:`shuffle` is False, it will be changed to True
+            and give a warning to user.
+            """
+            if self.shuffle is False:
+                self.shuffle = True
+                self.logger.warning('UserDataLoader must shuffle the data')
+
+        @property
+        def pr_end(self):
+            return len(self.dataset.user_feat)
+
+        def _shuffle(self):
+            self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)
+
+        def _next_batch_data(self):
+            cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
+            self.pr += self.step
+            return self._dataframe_to_interaction(cur_data)
+
+
+Other more complex Dataloader development can refer to the source code.
diff --git a/docs/source/developer_guide/customize_models.rst b/docs/source/developer_guide/customize_models.rst
new file mode 100644
index 000000000..e6acc6c90
--- /dev/null
+++ b/docs/source/developer_guide/customize_models.rst
@@ -0,0 +1,268 @@
+Customize Models
+======================
+Here, we present how to develop a new model, and apply it to the RecBole.
+
+RecBole supports General, Context-aware, Sequential and Knowledge-based
+recommendation.
+
+Create a New Model Class
+------------------------------
+To begin with, we should create a new model implementing from one of :class:`~recbole.model.abstract_recommender.GeneralRecommender`,
+:class:`~recbole.model.abstract_recommender.ContextRecommender`, :class:`~recbole.model.abstract_recommender.SequentialRecommender`,
+:class:`~recbole.model.abstract_recommender.KnowledgeRecommender`.
+For example, we would like to develop a general model named as NewModel and write the code to `newmodel.py`.
+
+.. code:: python
+
+    from recbole.model.abstract_recommender import GeneralRecommender
+
+    class NewModel(GeneralRecommender):
+        pass
+
+Then, we need to indicate :attr:`~recbole.model.abstract_recommender.AbstractRecommender.input_type`,
+RecBole supports two input types: :obj:`~recbole.utils.enum_type.InputType.POINTWISE` and :obj:`~recbole.utils.enum_type.InputType.PAIRWISE`.
+
+:obj:`~recbole.utils.enum_type.InputType.POINTWISE` will give the :attr:`item` and the corresponding :attr:`label`, which is suitable for pointwise loss, e.g., Cross Entropy Loss.
+
+:obj:`~recbole.utils.enum_type.InputType.PAIRWISE` will give the item :attr:`pos_item` and :attr:`neg_item`, which is suitable for pairwise loss, e.g., BPR Loss.
+
+Suppose we want to use pairwise loss:
+
+.. code:: python
+
+    from recbole.utils import InputType
+    from recbole.model.abstract_recommender import GeneralRecommender
+
+    class NewModel(GeneralRecommender):
+
+        input_type = InputType.PAIRWISE
+        pass
+
+Implement __init__()
+--------------------------------
+Then we redefine :meth:`__init__` method, :meth:`__init__` is used to initialize the model, including loading the dataset information, model parameters, define the model structure and initializing methods.
+
+:meth:`__init__` input the parameters of :attr:`config`. and :attr:`dataset`, where :attr:`config` is used to input parameters,
+:attr:`dataset` is leveraged to input datasets including :attr:`n_users`, :attr:`n_items`.
+
+Here, we suppose the NewModel encode the users and items, where we use :func:`~recbole.model.init.xavier_normal_initialization` to initialize the parameters, and use inner product to compute the score.
+
+.. code:: python
+
+    import torch
+    import torch.nn as nn
+
+    from recbole.model.loss import BPRLoss
+    from recbole.model.init import xavier_normal_initialization
+
+    def __init__(self, config, dataset):
+        super(NewModel, self).__init__(config, dataset)
+
+        # load dataset info
+        self.n_users = dataset.user_num
+        self.n_items = dataset.item_num
+
+        # load parameters info
+        self.embedding_size = config['embedding_size']
+
+        # define layers and loss
+        self.user_embedding = nn.Embedding(self.n_users, self.embedding_size)
+        self.item_embedding = nn.Embedding(self.n_items, self.embedding_size)
+        self.loss = BPRLoss()
+
+        # parameters initialization
+        self.apply(xavier_normal_initialization)
+
+
+Implement calcualte_loss()
+----------------------------------------
+Then we define the :meth:`calculate_loss` method, :meth:`calculate_loss` is used to compute the loss,
+the input parameters are :class:`~recbole.data.interaction.Interaction`, at last the method return a :class:`torch.Tensor` for computing the BP information.
+
+.. code:: python
+
+    import torch
+
+    def calculate_loss(self, interaction):
+        user = interaction[self.USER_ID]
+        pos_item = interaction[self.ITEM_ID]
+        neg_item = interaction[self.NEG_ITEM_ID]
+
+        user_e = self.user_embedding(user)                        # [batch_size, embedding_size]
+        pos_item_e = self.item_embedding(pos_item)                # [batch_size, embedding_size]
+        neg_item_e = self.item_embedding(neg_item)                # [batch_size, embedding_size]
+        pos_item_score = torch.mul(user_e, pos_item_e).sum(dim=1) # [batch_size]
+        neg_item_score = torch.mul(user_e, neg_item_e).sum(dim=1) # [batch_size]
+
+        loss = self.loss(pos_item_score, neg_item_score)          # []
+
+        return loss
+
+
+Implement predict()
+------------------------------
+At last, we define the :meth:`predict` method, which is used to compute the score for a give user-item pair.
+The input is a :class:`~recbole.data.interaction.Interaction`, and the output is a score.
+
+.. code:: python
+
+    import torch
+
+    def predict(self, interaction):
+        user = interaction[self.USER_ID]
+        item = interaction[self.ITEM_ID]
+
+        user_e = self.user_embedding(user)            # [batch_size, embedding_size]
+        item_e = self.item_embedding(item)            # [batch_size, embedding_size]
+
+        scores = torch.mul(user_e, item_e).sum(dim=1) # [batch_size]
+
+        return scores
+
+If you would like to evaluate the full ranking in the NewModel, RecBole also supports an accelerated predict method.
+
+.. code:: python
+
+   import torch
+
+   def full_sort_predict(self, interaction):
+      user = interaction[self.USER_ID]
+
+      user_e = self.user_embedding(user)                        # [batch_size, embedding_size]
+      all_item_e = self.item_embedding.weight                   # [n_items, batch_size]
+
+      scores = torch.matmul(user_e, all_item_e.transpose(0, 1)) # [batch_size, n_items]
+
+      return scores
+
+
+This method will recall this method to accelerate the ranking.
+
+
+Complete Code
+------------------------
+Thus the final implemented NewModel is:
+
+.. code:: python
+
+    import torch
+    import torch.nn as nn
+
+    from recbole.utils import InputType
+    from recbole.model.abstract_recommender import GeneralRecommender
+    from recbole.model.loss import BPRLoss
+    from recbole.model.init import xavier_normal_initialization
+
+
+    class NewModel(GeneralRecommender):
+
+        input_type = InputType.PAIRWISE
+
+        def __init__(self, config, dataset):
+            super(NewModel, self).__init__(config, dataset)
+
+            # load dataset info
+            self.n_users = dataset.user_num
+            self.n_items = dataset.item_num
+
+            # load parameters info
+            self.embedding_size = config['embedding_size']
+
+            # define layers and loss
+            self.user_embedding = nn.Embedding(self.n_users, self.embedding_size)
+            self.item_embedding = nn.Embedding(self.n_items, self.embedding_size)
+            self.loss = BPRLoss()
+
+            # parameters initialization
+            self.apply(xavier_normal_initialization)
+
+        def calculate_loss(self, interaction):
+            user = interaction[self.USER_ID]
+            pos_item = interaction[self.ITEM_ID]
+            neg_item = interaction[self.NEG_ITEM_ID]
+
+            user_e = self.user_embedding(user)                        # [batch_size, embedding_size]
+            pos_item_e = self.item_embedding(pos_item)                # [batch_size, embedding_size]
+            neg_item_e = self.item_embedding(neg_item)                # [batch_size, embedding_size]
+            pos_item_score = torch.mul(user_e, pos_item_e).sum(dim=1) # [batch_size]
+            neg_item_score = torch.mul(user_e, neg_item_e).sum(dim=1) # [batch_size]
+
+            loss = self.loss(pos_item_score, neg_item_score)          # []
+
+            return loss
+
+        def predict(self, interaction):
+            user = interaction[self.USER_ID]
+            item = interaction[self.ITEM_ID]
+
+            user_e = self.user_embedding(user)            # [batch_size, embedding_size]
+            item_e = self.item_embedding(item)            # [batch_size, embedding_size]
+
+            scores = torch.mul(user_e, item_e).sum(dim=1) # [batch_size]
+
+            return scores
+
+        def full_sort_predict(self, interaction):
+            user = interaction[self.USER_ID]
+
+            user_e = self.user_embedding(user)                        # [batch_size, embedding_size]
+            all_item_e = self.item_embedding.weight                   # [n_items, batch_size]
+
+            scores = torch.matmul(user_e, all_item_e.transpose(0, 1)) # [batch_size, n_items]
+
+            return scores
+
+Then, we can use NewModel in RecBole as follows (e.g., `run.py`):
+
+.. code:: python
+
+    from logging import getLogger
+    from recbole.utils import init_logger, init_seed
+    from recbole.trainer import Trainer
+    from newmodel import NewModel
+    from recbole.config import Config
+    from recbole.data import create_dataset, data_preparation
+
+
+    if __name__ == '__main__':
+
+        config = Config(model=NewModel, dataset='ml-100k')
+        init_seed(config['seed'], config['reproducibility'])
+
+        # logger initialization
+        init_logger(config)
+        logger = getLogger()
+
+        logger.info(config)
+
+        # dataset filtering
+        dataset = create_dataset(config)
+        logger.info(dataset)
+
+        # dataset splitting
+        train_data, valid_data, test_data = data_preparation(config, dataset)
+
+        # model loading and initialization
+        model = NewModel(config, train_data).to(config['device'])
+        logger.info(model)
+
+        # trainer loading and initialization
+        trainer = Trainer(config, model)
+
+        # model training
+        best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
+
+        # model evaluation
+        test_result = trainer.evaluate(test_data)
+
+        logger.info('best valid result: {}'.format(best_valid_result))
+        logger.info('test result: {}'.format(test_result))
+
+Then, we can run NewModel:
+
+.. code:: python
+
+    python run.py --embedding_size=64
+
+Note, please remember to configure the model parameters
+(such as ``embedding_size``) through config files, parameter dicts or command line.
diff --git a/docs/source/developer_guide/customize_samplers.rst b/docs/source/developer_guide/customize_samplers.rst
new file mode 100644
index 000000000..3c37797cc
--- /dev/null
+++ b/docs/source/developer_guide/customize_samplers.rst
@@ -0,0 +1,174 @@
+Customize Samplers
+======================
+Here we present how to develop a new sampler, and apply it into RecBole.
+The new sampler is used when we need complex sampling method.
+
+Here, we take the :class:`~recbole.sampler.sampler.KGSampler` as an example.
+
+
+Create a New Sampler Class
+-----------------------------
+To begin with, we create a new sampler based on :class:`~recbole.sampler.sampler.AbstractSampler`:
+
+.. code:: python
+
+    from recbole.sampler import AbstractSampler
+    class KGSampler(AbstractSampler):
+        pass
+
+
+Implement __init__()
+-----------------------
+Then, we implement :meth:`~recbole.sampler.sampler.KGSampler.__init__()`, in this method, we can flexibly define and initialize the parameters,
+where we only need to invoke :obj:`super.__init__(distribution)`.
+
+.. code:: python
+
+    def __init__(self, dataset, distribution='uniform'):
+        self.dataset = dataset
+
+        self.hid_field = dataset.head_entity_field
+        self.tid_field = dataset.tail_entity_field
+        self.hid_list = dataset.head_entities
+        self.tid_list = dataset.tail_entities
+
+        self.head_entities = set(dataset.head_entities)
+        self.entity_num = dataset.entity_num
+
+        super().__init__(distribution=distribution)
+
+
+Implement get_random_list()
+------------------------------
+We do not use the random function in python or numpy due to their lower efficiency.
+Instead, we realize our own :meth:`~recbole.sampler.sampler.AbstractSampler.random` function, where the key method is to combine the random list with the pointer.
+The pointer point to some element in the random list. When one calls :meth:`self.random`, the element is returned, and moves the pointer backward by one element.
+If the pointer point to the last element, then it will return to the head of the element.
+
+In :class:`~recbole.sampler.sampler.AbstractSampler`, the :meth:`~recbole.sampler.sampler.AbstractSampler.__init__` will call :meth:`~recbole.sampler.sampler.AbstractSampler.get_random_list`, and shuffle the results.
+We only need to return a list including all the elements.
+
+It should be noted ``0`` can be the token used for padding, thus one should remain this value.
+
+Example code:
+
+.. code:: python
+
+    def get_random_list(self):
+        if self.distribution == 'uniform':
+            return list(range(1, self.entity_num))
+        elif self.distribution == 'popularity':
+            return list(self.hid_list) + list(self.tid_list)
+        else:
+            raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+
+
+Implement get_used_ids()
+----------------------------
+For negative sampling, we do not want to sample positive instance, this function is used to compute the positive sample.
+The function will return numpy, and the index is the ID. The return value will be saved in :attr:`self.used_ids`.
+
+Example code:
+
+.. code:: python
+
+    def get_used_ids(self):
+        used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
+        for hid, tid in zip(self.hid_list, self.tid_list):
+            used_tail_entity_id[hid].add(tid)
+        return used_tail_entity_id
+
+
+Implementing the sampling function
+-----------------------------------
+In :class:`~recbole.sampler.sampler.AbstractSampler`, we have implemented :meth:`~recbole.sampler.sampler.AbstractSampler.sample_by_key_ids` function,
+where we have three parameters: :attr:`key_ids`, :attr:`num` and :attr:`used_ids`.
+:attr:`Key_ids` is the candidate objective ID list, :attr:`num` is the number of samples, :attr:`used_ids` are the positive sample list.
+
+In the function, we sample :attr:`num` instances for each element in :attr:`key_ids`. The function finally return :class:`numpy.ndarray`,
+the index of 0, len(key_ids), len(key_ids) * 2, …, len(key_ids) * (num - 1) is the result of key_ids[0].
+The index of 1, len(key_ids) + 1, len(key_ids) * 2 + 1, …, len(key_ids) * (num - 1) + 1 is the result of key_ids[1].
+
+One can also design her own sampler, if the above process is not appropriate.
+
+Example code:
+
+.. code:: python
+
+    def sample_by_entity_ids(self, head_entity_ids, num=1):
+        try:
+            return self.sample_by_key_ids(head_entity_ids, num, self.used_ids[head_entity_ids])
+        except IndexError:
+            for head_entity_id in head_entity_ids:
+                if head_entity_id not in self.head_entities:
+                    raise ValueError('head_entity_id [{}] not exist'.format(head_entity_id))
+
+
+Complete Code
+----------------------
+.. code:: python
+
+    class KGSampler(AbstractSampler):
+        """:class:`KGSampler` is used to sample negative entities in a knowledge graph.
+
+        Args:
+            dataset (Dataset): The knowledge graph dataset, which contains triplets in a knowledge graph.
+            distribution (str, optional): Distribution of the negative entities. Defaults to 'uniform'.
+        """
+        def __init__(self, dataset, distribution='uniform'):
+            self.dataset = dataset
+
+            self.hid_field = dataset.head_entity_field
+            self.tid_field = dataset.tail_entity_field
+            self.hid_list = dataset.head_entities
+            self.tid_list = dataset.tail_entities
+
+            self.head_entities = set(dataset.head_entities)
+            self.entity_num = dataset.entity_num
+
+            super().__init__(distribution=distribution)
+
+        def get_random_list(self):
+            """
+            Returns:
+                np.ndarray or list: Random list of entity_id.
+            """
+            if self.distribution == 'uniform':
+                return list(range(1, self.entity_num))
+            elif self.distribution == 'popularity':
+                return list(self.hid_list) + list(self.tid_list)
+            else:
+                raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+
+        def get_used_ids(self):
+            """
+            Returns:
+                np.ndarray: Used entity_ids is the same as tail_entity_ids in knowledge graph.
+                Index is head_entity_id, and element is a set of tail_entity_ids.
+            """
+            used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
+            for hid, tid in zip(self.hid_list, self.tid_list):
+                used_tail_entity_id[hid].add(tid)
+            return used_tail_entity_id
+
+        def sample_by_entity_ids(self, head_entity_ids, num=1):
+            """Sampling by head_entity_ids.
+
+            Args:
+                head_entity_ids (np.ndarray or list): Input head_entity_ids.
+                num (int, optional): Number of sampled entity_ids for each head_entity_id. Defaults to ``1``.
+
+            Returns:
+                np.ndarray: Sampled entity_ids.
+                entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], ...,
+                entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0];
+                entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], ...,
+                entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; ...; and so on.
+            """
+            try:
+                return self.sample_by_key_ids(head_entity_ids, num, self.used_ids[head_entity_ids])
+            except IndexError:
+                for head_entity_id in head_entity_ids:
+                    if head_entity_id not in self.head_entities:
+                        raise ValueError('head_entity_id [{}] not exist'.format(head_entity_id))
+
diff --git a/docs/source/developer_guide/customize_trainers.rst b/docs/source/developer_guide/customize_trainers.rst
new file mode 100644
index 000000000..fbd6df60e
--- /dev/null
+++ b/docs/source/developer_guide/customize_trainers.rst
@@ -0,0 +1,104 @@
+Customize Trainers
+======================
+Here, we present how to develop a new Trainer, and apply it into RecBole.
+For a new model, if the training method is complex, and existing trainer can not be used for training and evaluation,
+then we need to develop a new trainer.
+
+The function used to train the model is :meth:`fit`, it will call :meth:`_train_epoch` to train the model.
+
+The function used to evaluate the model is :meth:`evaluate`, it will call :meth:`_valid_epoch` to evaluate the model.
+
+If the developed model need more complex training method,
+then one can inherent the :class:`~recbole.trainer.trainer.Trainer`,
+and revise :meth:`~recbole.trainer.trainer.Trainer.fit` or :meth:`~recbole.trainer.trainer.Trainer._train_epoch`.
+
+If the developed model need more complex evaluation method,
+then one can inherent the :class:`~recbole.trainer.trainer.Trainer`,
+and revise :meth:`~recbole.trainer.trainer.Trainer.evaluate` or :meth:`~recbole.trainer.trainer.Trainer._valid_epoch`.
+
+
+Example
+----------------
+Here we present a simple Trainer example, which is used for alternative optimization.
+We revise the :meth:`~recbole.trainer.trainer.Trainer._train_epoch` method.
+To begin with, we need to create a new class for
+:class:`NewTrainer` based on :class:`~recbole.trainer.trainer.Trainer`.
+
+.. code:: python
+
+    from recbole.trainer import Trainer
+
+    class NewTrainer(Trainer):
+
+        def __init__(self, config, model):
+            super(NewTrainer, self).__init__(config, model)
+
+
+Then we revise :meth:`~recbole.trainer.trainer.Trainer._train_epoch`.
+Here, the losses are alternatively optimized after each epoch,
+and the losses are computed by :meth:`calculate_loss1` and :meth:`calculate_loss2`
+
+
+.. code:: python
+
+    def _train_epoch(self, train_data, epoch_idx):
+        self.model.train()
+        total_loss = 0.
+
+        if epoch_idx % 2 == 0:
+            for batch_idx, interaction in enumerate(train_data):
+                interaction = interaction.to(self.device)
+                self.optimizer.zero_grad()
+                loss = self.model.calculate_loss1(interaction)
+                self._check_nan(loss)
+                loss.backward()
+                self.optimizer.step()
+                total_loss += loss.item()
+        else:
+            for batch_idx, interaction in enumerate(train_data):
+                interaction = interaction.to(self.device)
+                self.optimizer.zero_grad()
+                loss = self.model.calculate_loss2(interaction)
+                self._check_nan(loss)
+                loss.backward()
+                self.optimizer.step()
+                total_loss += loss.item()
+        return total_loss
+
+
+Complete Code
+^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    from recbole.trainer import Trainer
+
+    class NewTrainer(Trainer):
+
+        def __init__(self, config, model):
+            super(NewTrainer, self).__init__(config, model)
+
+        def _train_epoch(self, train_data, epoch_idx):
+            self.model.train()
+            total_loss = 0.
+
+            if epoch_idx % 2 == 0:
+                for batch_idx, interaction in enumerate(train_data):
+                    interaction = interaction.to(self.device)
+                    self.optimizer.zero_grad()
+                    loss = self.model.calculate_loss1(interaction)
+                    self._check_nan(loss)
+                    loss.backward()
+                    self.optimizer.step()
+                    total_loss += loss.item()
+            else:
+                for batch_idx, interaction in enumerate(train_data):
+                    interaction = interaction.to(self.device)
+                    self.optimizer.zero_grad()
+                    loss = self.model.calculate_loss2(interaction)
+                    self._check_nan(loss)
+                    loss.backward()
+                    self.optimizer.step()
+                    total_loss += loss.item()
+            return total_loss
+
diff --git a/docs/source/get_started/install.rst b/docs/source/get_started/install.rst
new file mode 100644
index 000000000..b798defcc
--- /dev/null
+++ b/docs/source/get_started/install.rst
@@ -0,0 +1,56 @@
+Install RecBole
+======================
+RecBole can be installed from ``conda``, ``pip`` and source files.
+
+
+System requirements
+------------------------
+RecBole is compatible with the following operating systems:
+
+* Linux
+* Windows 10
+* macOS X
+
+Python 3.6 (or later), torch 1.6.0 (or later) are required to install our library. If you want to use RecBole with GPU,
+please ensure that CUDA or CUDAToolkit version is 9.2 or later.
+This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).
+
+
+Install from conda
+--------------------------
+``Conda`` can be installed from `miniconda <https://conda.io/miniconda.html>`_ or
+the full `anaconda <https://www.anaconda.com/download/>`_.
+If you are in China, `Tsinghua Mirrors <https://mirror.tuna.tsinghua.edu.cn/help/anaconda/>`_ is recommended.
+
+After installing ``conda``,
+run `conda create -n recbole python=3.6` to create the Python 3.6 conda environment.
+Then the environment can be activated by `conda activate recbole`.
+At last, run the following command to install RecBole:
+
+.. code:: bash
+
+    conda install -c aibox recbole
+
+
+Install from pip
+-------------------------
+To install RecBole from pip, only the following command is needed:
+
+.. code:: bash
+
+    pip install recbole
+
+
+Install from source
+-------------------------
+Download the source files from GitHub.
+
+.. code:: bash
+
+    git clone https://github.com/RUCAIBox/RecBole.git && cd RecBole
+
+Run the following command to install:
+
+.. code:: bash
+
+    pip install -e . --verbose
diff --git a/docs/source/get_started/introduction.rst b/docs/source/get_started/introduction.rst
new file mode 100644
index 000000000..406fa98b7
--- /dev/null
+++ b/docs/source/get_started/introduction.rst
@@ -0,0 +1,31 @@
+Introduction
+==============
+
+RecBole is a unified, comprehensive and efficient framework developed based on PyTorch.
+It aims to help the researchers to reproduce and develop recommendation models.
+
+In the first release, our library includes 65 recommendation algorithms `[Model List]`_, covering four major categories:
+
+- General Recommendation
+- Sequential Recommendation
+- Context-aware Recommendation
+- Knowledge-based Recommendation
+
+We design a unified and flexible data file format, and provide the support for 28 benchmark recommendation datasets `[Collected Datasets]`_. A user can apply the provided script to process the original data copy, or simply download the processed datasets by our team.
+
+Features:
+
+- General and extensible data structure
+    We deign general and extensible data structures to unify the formatting and usage of various recommendation datasets.
+- Comprehensive benchmark models and datasets
+    We implement 65 commonly used recommendation algorithms, and provide the formatted copies of 28 recommendation datasets.
+- Efficient GPU-accelerated execution
+    We design many tailored strategies in the GPU environment to enhance the efficiency of our library.
+- Extensive and standard evaluation protocols
+    We support a series of commonly used evaluation protocols or settings for testing and comparing recommendation algorithms.
+
+.. _[Collected Datasets]:
+    /dataset_list.html
+
+.. _[Model List]:
+    /model_list.html
diff --git a/docs/source/get_started/quick_start.rst b/docs/source/get_started/quick_start.rst
new file mode 100644
index 000000000..9a97a8a15
--- /dev/null
+++ b/docs/source/get_started/quick_start.rst
@@ -0,0 +1,120 @@
+Quick Start
+===============
+Here is a quick-start example for using RecBole.
+
+Quick-start From Source
+--------------------------
+With the source code of `RecBole <https://github.com/RUCAIBox/RecBole>`_,
+the following script can be used to run a toy example of our library.
+
+.. code:: bash
+
+    python run_recbole.py
+
+This script will run the BPR model on the ml-100k dataset.
+
+Typically, this example takes less than one minute. We will obtain some output like:
+
+.. code:: none
+
+    INFO ml-100k
+    The number of users: 944
+    Average actions of users: 106.04453870625663
+    The number of items: 1683
+    Average actions of items: 59.45303210463734
+    The number of inters: 100000
+    The sparsity of the dataset: 93.70575143257098%
+
+    INFO Evaluation Settings:
+    Group by user_id
+    Ordering: {'strategy': 'shuffle'}
+    Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}
+    Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
+
+    INFO BPRMF(
+        (user_embedding): Embedding(944, 64)
+        (item_embedding): Embedding(1683, 64)
+        (loss): BPRLoss()
+    )
+    Trainable parameters: 168128
+
+    INFO epoch 0 training [time: 0.27s, train loss: 27.7231]
+    INFO epoch 0 evaluating [time: 0.12s, valid_score: 0.021900]
+    INFO valid result:
+    recall@10: 0.0073  mrr@10: 0.0219  ndcg@10: 0.0093  hit@10: 0.0795  precision@10: 0.0088
+
+    ...
+
+    INFO epoch 63 training [time: 0.19s, train loss: 4.7660]
+    INFO epoch 63 evaluating [time: 0.08s, valid_score: 0.394500]
+    INFO valid result:
+    recall@10: 0.2156  mrr@10: 0.3945  ndcg@10: 0.2332  hit@10: 0.7593  precision@10: 0.1591
+
+    INFO Finished training, best eval result in epoch 52
+    INFO Loading model structure and parameters from saved/***.pth
+    INFO best valid result:
+    recall@10: 0.2169  mrr@10: 0.4005  ndcg@10: 0.235  hit@10: 0.7582  precision@10: 0.1598
+    INFO test result:
+    recall@10: 0.2368  mrr@10: 0.4519  ndcg@10: 0.2768  hit@10: 0.7614  precision@10: 0.1901
+
+Note that using the quick start pipeline we provide, the original dataset will be divided into training set, validation set and test set by default.
+We optimize model parameters on the training set, do parameter selection according to the results on the validation set,
+and finally report the results on the test set.
+
+If you want to change the parameters, such as ``learning_rate``, ``embedding_size``,
+just set the additional command parameters as you need:
+
+.. code:: bash
+
+    python run_recbole.py --learning_rate=0.0001 --embedding_size=128
+
+
+If you want to change the models, just run the script by setting additional command parameters:
+
+.. code:: bash
+
+    python run_recbole.py --model=[model_name]
+
+``model_name`` indicates the model to be initialized.
+RecBole has implemented four categories of recommendation algorithms
+including general recommendation, context-aware recommendation,
+sequential recommendation and knowledge-based recommendation.
+More details can be found in :doc:`../user_guide/model_intro`.
+
+
+The datasets can be changed according to :doc:`../user_guide/data_intro`.
+
+
+Quick-start From API
+-------------------------
+If RecBole is installed from ``pip`` or ``conda``, you can create a new python file (e.g., `run.py`),
+and write the following code:
+
+.. code:: python
+
+    from recbole.quick_start import run_recbole
+
+    run_recbole()
+
+
+Then run the following command:
+
+.. code:: bash
+
+    python run.py --dataset=ml-100k --model=BPR
+
+This will perform the training and test of the BPR model on the ml-100k dataset.
+
+One can also use similar methods as mentioned above to run different models, parameters or datasets,
+the operations are same with `Quick-start From Source`_.
+
+
+In-depth Usage
+-------------------
+For a more in-depth usage about RecBole, take a look at
+
+- :doc:`../user_guide/config_settings`
+- :doc:`../user_guide/data_intro`
+- :doc:`../user_guide/model_intro`
+- :doc:`../user_guide/evaluation_support`
+- :doc:`../user_guide/usage`
diff --git a/docs/source/index.rst b/docs/source/index.rst
new file mode 100644
index 000000000..96c20bbe7
--- /dev/null
+++ b/docs/source/index.rst
@@ -0,0 +1,59 @@
+.. RecBole documentation master file.
+
+RecBole v0.2.0
+=========================================================
+
+`HomePage <https://recbole.io/>`_ | `Docs <https://recbole.io/docs/>`_ | `GitHub <https://github.com/RUCAIBox/RecBole>`_ | `Datasets <https://github.com/RUCAIBox/RecDatasets>`_ | `v0.1.2 </docs/v0.1.2/>`_
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Get Started
+
+   get_started/introduction
+   get_started/install
+   get_started/quick_start
+
+.. toctree::
+   :maxdepth: 1
+   :caption: User Guide
+
+   user_guide/config_settings
+   user_guide/data_intro
+   user_guide/model_intro
+   user_guide/evaluation_support
+   user_guide/usage
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Developer Guide
+
+   developer_guide/customize_models
+   developer_guide/customize_trainers
+   developer_guide/customize_dataloaders
+   developer_guide/customize_samplers
+
+
+.. toctree::
+   :maxdepth: 1
+   :caption: API REFERENCE:
+
+   recbole/recbole.config
+   recbole/recbole.data
+   recbole/recbole.evaluator
+   recbole/recbole.model
+   recbole/recbole.quick_start.quick_start
+   recbole/recbole.sampler.sampler
+   recbole/recbole.trainer.hyper_tuning
+   recbole/recbole.trainer.trainer
+   recbole/recbole.utils.case_study
+   recbole/recbole.utils.enum_type
+   recbole/recbole.utils.logger
+   recbole/recbole.utils.utils
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
diff --git a/docs/source/recbole/recbole.config.configurator.rst b/docs/source/recbole/recbole.config.configurator.rst
new file mode 100644
index 000000000..1818e7650
--- /dev/null
+++ b/docs/source/recbole/recbole.config.configurator.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.config.configurator
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.config.eval_setting.rst b/docs/source/recbole/recbole.config.eval_setting.rst
new file mode 100644
index 000000000..cd78d6dca
--- /dev/null
+++ b/docs/source/recbole/recbole.config.eval_setting.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.config.eval_setting
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.config.rst b/docs/source/recbole/recbole.config.rst
new file mode 100644
index 000000000..1b58676fc
--- /dev/null
+++ b/docs/source/recbole/recbole.config.rst
@@ -0,0 +1,8 @@
+recbole.config
+======================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.config.configurator
+   recbole.config.eval_setting
diff --git a/docs/source/recbole/recbole.data.dataloader.abstract_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.abstract_dataloader.rst
new file mode 100644
index 000000000..a7d6772f3
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.abstract_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.abstract_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.context_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.context_dataloader.rst
new file mode 100644
index 000000000..f46d5ee0c
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.context_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.context_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.general_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.general_dataloader.rst
new file mode 100644
index 000000000..a1ab677fa
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.general_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.general_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.knowledge_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.knowledge_dataloader.rst
new file mode 100644
index 000000000..fd4eaa083
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.knowledge_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.knowledge_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.neg_sample_mixin.rst b/docs/source/recbole/recbole.data.dataloader.neg_sample_mixin.rst
new file mode 100644
index 000000000..67fdd0e93
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.neg_sample_mixin.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.neg_sample_mixin
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.rst b/docs/source/recbole/recbole.data.dataloader.rst
new file mode 100644
index 000000000..1dc37ef3d
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.rst
@@ -0,0 +1,13 @@
+recbole.data.dataloader
+===============================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.data.dataloader.abstract_dataloader
+   recbole.data.dataloader.context_dataloader
+   recbole.data.dataloader.general_dataloader
+   recbole.data.dataloader.knowledge_dataloader
+   recbole.data.dataloader.neg_sample_mixin
+   recbole.data.dataloader.sequential_dataloader
+   recbole.data.dataloader.user_dataloader
diff --git a/docs/source/recbole/recbole.data.dataloader.sequential_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.sequential_dataloader.rst
new file mode 100644
index 000000000..94ab388e9
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.sequential_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.sequential_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataloader.user_dataloader.rst b/docs/source/recbole/recbole.data.dataloader.user_dataloader.rst
new file mode 100644
index 000000000..0fdba68de
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataloader.user_dataloader.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataloader.user_dataloader
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.customized_dataset.rst b/docs/source/recbole/recbole.data.dataset.customized_dataset.rst
new file mode 100644
index 000000000..e70b27f01
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.customized_dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.customized_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.dataset.rst b/docs/source/recbole/recbole.data.dataset.dataset.rst
new file mode 100644
index 000000000..b7174a373
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.kg_dataset.rst b/docs/source/recbole/recbole.data.dataset.kg_dataset.rst
new file mode 100644
index 000000000..c825654bc
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.kg_dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.kg_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.kg_seq_dataset.rst b/docs/source/recbole/recbole.data.dataset.kg_seq_dataset.rst
new file mode 100644
index 000000000..44bdec0ca
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.kg_seq_dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.kg_seq_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.rst b/docs/source/recbole/recbole.data.dataset.rst
new file mode 100644
index 000000000..64d432fae
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.rst
@@ -0,0 +1,12 @@
+recbole.data.dataset
+============================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.data.dataset.customized_dataset
+   recbole.data.dataset.dataset
+   recbole.data.dataset.kg_dataset
+   recbole.data.dataset.kg_seq_dataset
+   recbole.data.dataset.sequential_dataset
+   recbole.data.dataset.social_dataset
diff --git a/docs/source/recbole/recbole.data.dataset.sequential_dataset.rst b/docs/source/recbole/recbole.data.dataset.sequential_dataset.rst
new file mode 100644
index 000000000..158a3d393
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.sequential_dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.sequential_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.dataset.social_dataset.rst b/docs/source/recbole/recbole.data.dataset.social_dataset.rst
new file mode 100644
index 000000000..47db01c07
--- /dev/null
+++ b/docs/source/recbole/recbole.data.dataset.social_dataset.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.dataset.social_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.interaction.rst b/docs/source/recbole/recbole.data.interaction.rst
new file mode 100644
index 000000000..f669c89f8
--- /dev/null
+++ b/docs/source/recbole/recbole.data.interaction.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.interaction
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.data.rst b/docs/source/recbole/recbole.data.rst
new file mode 100644
index 000000000..3996bb35c
--- /dev/null
+++ b/docs/source/recbole/recbole.data.rst
@@ -0,0 +1,10 @@
+recbole.data
+====================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.data.dataloader
+   recbole.data.dataset
+   recbole.data.interaction
+   recbole.data.utils
diff --git a/docs/source/recbole/recbole.data.utils.rst b/docs/source/recbole/recbole.data.utils.rst
new file mode 100644
index 000000000..8b20deadf
--- /dev/null
+++ b/docs/source/recbole/recbole.data.utils.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.data.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.evaluator.abstract_evaluator.rst b/docs/source/recbole/recbole.evaluator.abstract_evaluator.rst
new file mode 100644
index 000000000..66de59e49
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.abstract_evaluator.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.evaluator.abstract_evaluator
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.evaluator.evaluators.rst b/docs/source/recbole/recbole.evaluator.evaluators.rst
new file mode 100644
index 000000000..8aeca9804
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.evaluators.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.evaluator.evaluators
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.evaluator.metrics.rst b/docs/source/recbole/recbole.evaluator.metrics.rst
new file mode 100644
index 000000000..2ef968131
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.metrics.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.evaluator.metrics
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.evaluator.proxy_evaluator.rst b/docs/source/recbole/recbole.evaluator.proxy_evaluator.rst
new file mode 100644
index 000000000..c4689d43a
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.proxy_evaluator.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.evaluator.proxy_evaluator
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.evaluator.rst b/docs/source/recbole/recbole.evaluator.rst
new file mode 100644
index 000000000..0f3c99793
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.rst
@@ -0,0 +1,11 @@
+recbole.evaluator
+=========================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.evaluator.abstract_evaluator
+   recbole.evaluator.evaluators
+   recbole.evaluator.metrics
+   recbole.evaluator.proxy_evaluator
+   recbole.evaluator.utils
diff --git a/docs/source/recbole/recbole.evaluator.utils.rst b/docs/source/recbole/recbole.evaluator.utils.rst
new file mode 100644
index 000000000..d57b0fa71
--- /dev/null
+++ b/docs/source/recbole/recbole.evaluator.utils.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.evaluator.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.abstract_recommender.rst b/docs/source/recbole/recbole.model.abstract_recommender.rst
new file mode 100644
index 000000000..a346344cc
--- /dev/null
+++ b/docs/source/recbole/recbole.model.abstract_recommender.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.abstract_recommender
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.afm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.afm.rst
new file mode 100644
index 000000000..fe80d513d
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.afm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.afm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.autoint.rst b/docs/source/recbole/recbole.model.context_aware_recommender.autoint.rst
new file mode 100644
index 000000000..9e914fede
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.autoint.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.autoint
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.dcn.rst b/docs/source/recbole/recbole.model.context_aware_recommender.dcn.rst
new file mode 100644
index 000000000..0aac91819
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.dcn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.dcn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.deepfm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.deepfm.rst
new file mode 100644
index 000000000..356462127
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.deepfm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.deepfm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.dssm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.dssm.rst
new file mode 100644
index 000000000..0c2bb69db
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.dssm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.dssm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.ffm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.ffm.rst
new file mode 100644
index 000000000..cece0e842
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.ffm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.ffm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.fm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.fm.rst
new file mode 100644
index 000000000..93d6eba0a
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.fm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.fm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.fnn.rst b/docs/source/recbole/recbole.model.context_aware_recommender.fnn.rst
new file mode 100644
index 000000000..f884fe80b
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.fnn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.fnn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.fwfm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.fwfm.rst
new file mode 100644
index 000000000..d776a3fb0
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.fwfm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.fwfm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.lr.rst b/docs/source/recbole/recbole.model.context_aware_recommender.lr.rst
new file mode 100644
index 000000000..d64ba2088
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.lr.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.lr
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.nfm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.nfm.rst
new file mode 100644
index 000000000..15cf09969
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.nfm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.nfm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.pnn.rst b/docs/source/recbole/recbole.model.context_aware_recommender.pnn.rst
new file mode 100644
index 000000000..d43f4f08d
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.pnn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.pnn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.rst b/docs/source/recbole/recbole.model.context_aware_recommender.rst
new file mode 100644
index 000000000..aeb5c3c5e
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.rst
@@ -0,0 +1,20 @@
+recbole.model.context\_aware\_recommender
+=================================================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.context_aware_recommender.afm
+   recbole.model.context_aware_recommender.autoint
+   recbole.model.context_aware_recommender.dcn
+   recbole.model.context_aware_recommender.deepfm
+   recbole.model.context_aware_recommender.dssm
+   recbole.model.context_aware_recommender.ffm
+   recbole.model.context_aware_recommender.fm
+   recbole.model.context_aware_recommender.fnn
+   recbole.model.context_aware_recommender.fwfm
+   recbole.model.context_aware_recommender.lr
+   recbole.model.context_aware_recommender.nfm
+   recbole.model.context_aware_recommender.pnn
+   recbole.model.context_aware_recommender.widedeep
+   recbole.model.context_aware_recommender.xdeepfm
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.widedeep.rst b/docs/source/recbole/recbole.model.context_aware_recommender.widedeep.rst
new file mode 100644
index 000000000..8bcb6834d
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.widedeep.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.widedeep
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.context_aware_recommender.xdeepfm.rst b/docs/source/recbole/recbole.model.context_aware_recommender.xdeepfm.rst
new file mode 100644
index 000000000..8e64f67dc
--- /dev/null
+++ b/docs/source/recbole/recbole.model.context_aware_recommender.xdeepfm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.context_aware_recommender.xdeepfm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.exlib_recommender.rst b/docs/source/recbole/recbole.model.exlib_recommender.rst
new file mode 100644
index 000000000..d7d03911f
--- /dev/null
+++ b/docs/source/recbole/recbole.model.exlib_recommender.rst
@@ -0,0 +1,7 @@
+recbole.model.exlib\_recommender
+=============================================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.exlib_recommender.xgboost
diff --git a/docs/source/recbole/recbole.model.exlib_recommender.xgboost.rst b/docs/source/recbole/recbole.model.exlib_recommender.xgboost.rst
new file mode 100644
index 000000000..fbbfc2b6f
--- /dev/null
+++ b/docs/source/recbole/recbole.model.exlib_recommender.xgboost.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.exlib_recommender.xgboost
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.bpr.rst b/docs/source/recbole/recbole.model.general_recommender.bpr.rst
new file mode 100644
index 000000000..0eb41563d
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.bpr.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.bpr
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.cdae.rst b/docs/source/recbole/recbole.model.general_recommender.cdae.rst
new file mode 100644
index 000000000..5ec3b7dec
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.cdae.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.cdae
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.convncf.rst b/docs/source/recbole/recbole.model.general_recommender.convncf.rst
new file mode 100644
index 000000000..ee388d326
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.convncf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.convncf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.dgcf.rst b/docs/source/recbole/recbole.model.general_recommender.dgcf.rst
new file mode 100644
index 000000000..6551d7966
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.dgcf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.dgcf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.dmf.rst b/docs/source/recbole/recbole.model.general_recommender.dmf.rst
new file mode 100644
index 000000000..499706b1a
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.dmf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.dmf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.fism.rst b/docs/source/recbole/recbole.model.general_recommender.fism.rst
new file mode 100644
index 000000000..c706123e5
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.fism.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.fism
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.gcmc.rst b/docs/source/recbole/recbole.model.general_recommender.gcmc.rst
new file mode 100644
index 000000000..27a234987
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.gcmc.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.gcmc
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.itemknn.rst b/docs/source/recbole/recbole.model.general_recommender.itemknn.rst
new file mode 100644
index 000000000..aabf3fd97
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.itemknn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.itemknn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.lightgcn.rst b/docs/source/recbole/recbole.model.general_recommender.lightgcn.rst
new file mode 100644
index 000000000..001bcc62e
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.lightgcn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.lightgcn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.line.rst b/docs/source/recbole/recbole.model.general_recommender.line.rst
new file mode 100644
index 000000000..1d1da4ea9
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.line.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.line
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.macridvae.rst b/docs/source/recbole/recbole.model.general_recommender.macridvae.rst
new file mode 100644
index 000000000..f1363d7a6
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.macridvae.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.macridvae
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.multidae.rst b/docs/source/recbole/recbole.model.general_recommender.multidae.rst
new file mode 100644
index 000000000..becbaba13
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.multidae.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.multidae
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.multivae.rst b/docs/source/recbole/recbole.model.general_recommender.multivae.rst
new file mode 100644
index 000000000..0888bad2c
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.multivae.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.multivae
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.nais.rst b/docs/source/recbole/recbole.model.general_recommender.nais.rst
new file mode 100644
index 000000000..b3d30cab1
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.nais.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.nais
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.neumf.rst b/docs/source/recbole/recbole.model.general_recommender.neumf.rst
new file mode 100644
index 000000000..48173b93f
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.neumf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.neumf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.ngcf.rst b/docs/source/recbole/recbole.model.general_recommender.ngcf.rst
new file mode 100644
index 000000000..12703290b
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.ngcf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.ngcf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.pop.rst b/docs/source/recbole/recbole.model.general_recommender.pop.rst
new file mode 100644
index 000000000..8e32fc007
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.pop.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.pop
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.rst b/docs/source/recbole/recbole.model.general_recommender.rst
new file mode 100644
index 000000000..b4cc32610
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.rst
@@ -0,0 +1,24 @@
+recbole.model.general\_recommender
+==========================================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.general_recommender.bpr
+   recbole.model.general_recommender.cdae
+   recbole.model.general_recommender.convncf
+   recbole.model.general_recommender.dgcf
+   recbole.model.general_recommender.dmf
+   recbole.model.general_recommender.fism
+   recbole.model.general_recommender.gcmc
+   recbole.model.general_recommender.itemknn
+   recbole.model.general_recommender.lightgcn
+   recbole.model.general_recommender.line
+   recbole.model.general_recommender.macridvae
+   recbole.model.general_recommender.multidae
+   recbole.model.general_recommender.multivae
+   recbole.model.general_recommender.nais
+   recbole.model.general_recommender.neumf
+   recbole.model.general_recommender.ngcf
+   recbole.model.general_recommender.pop
+   recbole.model.general_recommender.spectralcf
diff --git a/docs/source/recbole/recbole.model.general_recommender.spectralcf.rst b/docs/source/recbole/recbole.model.general_recommender.spectralcf.rst
new file mode 100644
index 000000000..209accaa2
--- /dev/null
+++ b/docs/source/recbole/recbole.model.general_recommender.spectralcf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.spectralcf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.init.rst b/docs/source/recbole/recbole.model.init.rst
new file mode 100644
index 000000000..e7afaeb72
--- /dev/null
+++ b/docs/source/recbole/recbole.model.init.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.init
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.cfkg.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.cfkg.rst
new file mode 100644
index 000000000..46f6fe493
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.cfkg.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.cfkg
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.cke.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.cke.rst
new file mode 100644
index 000000000..7ada3c3d9
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.cke.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.cke
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgat.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgat.rst
new file mode 100644
index 000000000..5387132b5
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgat.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.kgat
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgcn.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgcn.rst
new file mode 100644
index 000000000..d8e5dd177
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgcn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.kgcn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgnnls.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgnnls.rst
new file mode 100644
index 000000000..450e497c0
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.kgnnls.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.kgnnls
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.ktup.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.ktup.rst
new file mode 100644
index 000000000..83f316b52
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.ktup.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.ktup
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.mkr.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.mkr.rst
new file mode 100644
index 000000000..13af4b806
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.mkr.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.mkr
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.ripplenet.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.ripplenet.rst
new file mode 100644
index 000000000..da6d7f790
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.ripplenet.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.knowledge_aware_recommender.ripplenet
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.knowledge_aware_recommender.rst b/docs/source/recbole/recbole.model.knowledge_aware_recommender.rst
new file mode 100644
index 000000000..fd025e998
--- /dev/null
+++ b/docs/source/recbole/recbole.model.knowledge_aware_recommender.rst
@@ -0,0 +1,14 @@
+recbole.model.knowledge\_aware\_recommender
+===================================================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.knowledge_aware_recommender.cfkg
+   recbole.model.knowledge_aware_recommender.cke
+   recbole.model.knowledge_aware_recommender.kgat
+   recbole.model.knowledge_aware_recommender.kgcn
+   recbole.model.knowledge_aware_recommender.kgnnls
+   recbole.model.knowledge_aware_recommender.ktup
+   recbole.model.knowledge_aware_recommender.mkr
+   recbole.model.knowledge_aware_recommender.ripplenet
diff --git a/docs/source/recbole/recbole.model.layers.rst b/docs/source/recbole/recbole.model.layers.rst
new file mode 100644
index 000000000..d4ee82d6e
--- /dev/null
+++ b/docs/source/recbole/recbole.model.layers.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.layers
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.loss.rst b/docs/source/recbole/recbole.model.loss.rst
new file mode 100644
index 000000000..9876f53e5
--- /dev/null
+++ b/docs/source/recbole/recbole.model.loss.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.loss
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.rst b/docs/source/recbole/recbole.model.rst
new file mode 100644
index 000000000..5b83fbbde
--- /dev/null
+++ b/docs/source/recbole/recbole.model.rst
@@ -0,0 +1,15 @@
+recbole.model
+=====================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.context_aware_recommender
+   recbole.model.exlib_recommender
+   recbole.model.general_recommender
+   recbole.model.knowledge_aware_recommender
+   recbole.model.sequential_recommender
+   recbole.model.abstract_recommender
+   recbole.model.init
+   recbole.model.layers
+   recbole.model.loss
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.bert4rec.rst b/docs/source/recbole/recbole.model.sequential_recommender.bert4rec.rst
new file mode 100644
index 000000000..271b09a48
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.bert4rec.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.bert4rec
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.caser.rst b/docs/source/recbole/recbole.model.sequential_recommender.caser.rst
new file mode 100644
index 000000000..8428fa2a4
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.caser.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.caser
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.din.rst b/docs/source/recbole/recbole.model.sequential_recommender.din.rst
new file mode 100644
index 000000000..652222c6e
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.din.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.din
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.fdsa.rst b/docs/source/recbole/recbole.model.sequential_recommender.fdsa.rst
new file mode 100644
index 000000000..b3638d5b2
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.fdsa.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.fdsa
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.fossil.rst b/docs/source/recbole/recbole.model.sequential_recommender.fossil.rst
new file mode 100644
index 000000000..0b0bafbfb
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.fossil.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.fossil
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.fpmc.rst b/docs/source/recbole/recbole.model.sequential_recommender.fpmc.rst
new file mode 100644
index 000000000..93d3af95c
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.fpmc.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.fpmc
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.gcsan.rst b/docs/source/recbole/recbole.model.sequential_recommender.gcsan.rst
new file mode 100644
index 000000000..7bde80be3
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.gcsan.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.gcsan
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.gru4rec.rst b/docs/source/recbole/recbole.model.sequential_recommender.gru4rec.rst
new file mode 100644
index 000000000..8fadecf8e
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.gru4rec.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.gru4rec
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.gru4recf.rst b/docs/source/recbole/recbole.model.sequential_recommender.gru4recf.rst
new file mode 100644
index 000000000..0c2ee9d4c
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.gru4recf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.gru4recf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.gru4reckg.rst b/docs/source/recbole/recbole.model.sequential_recommender.gru4reckg.rst
new file mode 100644
index 000000000..a5a8e4724
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.gru4reckg.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.gru4reckg
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.hgn.rst b/docs/source/recbole/recbole.model.sequential_recommender.hgn.rst
new file mode 100644
index 000000000..510849766
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.hgn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.hgn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.hrm.rst b/docs/source/recbole/recbole.model.sequential_recommender.hrm.rst
new file mode 100644
index 000000000..d4c7c9db0
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.hrm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.hrm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.ksr.rst b/docs/source/recbole/recbole.model.sequential_recommender.ksr.rst
new file mode 100644
index 000000000..99442a674
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.ksr.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.ksr
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.narm.rst b/docs/source/recbole/recbole.model.sequential_recommender.narm.rst
new file mode 100644
index 000000000..15e52ffdb
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.narm.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.narm
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.nextitnet.rst b/docs/source/recbole/recbole.model.sequential_recommender.nextitnet.rst
new file mode 100644
index 000000000..a6e917b59
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.nextitnet.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.nextitnet
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.npe.rst b/docs/source/recbole/recbole.model.sequential_recommender.npe.rst
new file mode 100644
index 000000000..9a56fa28f
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.npe.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.npe
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.repeatnet.rst b/docs/source/recbole/recbole.model.sequential_recommender.repeatnet.rst
new file mode 100644
index 000000000..f0055ec7a
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.repeatnet.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.repeatnet
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.rst b/docs/source/recbole/recbole.model.sequential_recommender.rst
new file mode 100644
index 000000000..258674a8b
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.rst
@@ -0,0 +1,30 @@
+recbole.model.sequential\_recommender
+=============================================
+
+.. toctree::
+   :maxdepth: 4
+
+   recbole.model.sequential_recommender.bert4rec
+   recbole.model.sequential_recommender.caser
+   recbole.model.sequential_recommender.din
+   recbole.model.sequential_recommender.fdsa
+   recbole.model.sequential_recommender.fossil
+   recbole.model.sequential_recommender.fpmc
+   recbole.model.sequential_recommender.gcsan
+   recbole.model.sequential_recommender.gru4rec
+   recbole.model.sequential_recommender.gru4recf
+   recbole.model.sequential_recommender.gru4reckg
+   recbole.model.sequential_recommender.hgn
+   recbole.model.sequential_recommender.hrm
+   recbole.model.sequential_recommender.ksr
+   recbole.model.sequential_recommender.narm
+   recbole.model.sequential_recommender.nextitnet
+   recbole.model.sequential_recommender.npe
+   recbole.model.sequential_recommender.repeatnet
+   recbole.model.sequential_recommender.s3rec
+   recbole.model.sequential_recommender.sasrec
+   recbole.model.sequential_recommender.sasrecf
+   recbole.model.sequential_recommender.shan
+   recbole.model.sequential_recommender.srgnn
+   recbole.model.sequential_recommender.stamp
+   recbole.model.sequential_recommender.transrec
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.s3rec.rst b/docs/source/recbole/recbole.model.sequential_recommender.s3rec.rst
new file mode 100644
index 000000000..c9392886f
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.s3rec.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.s3rec
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.sasrec.rst b/docs/source/recbole/recbole.model.sequential_recommender.sasrec.rst
new file mode 100644
index 000000000..2c6a563ed
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.sasrec.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.sasrec
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.sasrecf.rst b/docs/source/recbole/recbole.model.sequential_recommender.sasrecf.rst
new file mode 100644
index 000000000..0ed23675d
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.sasrecf.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.sasrecf
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.shan.rst b/docs/source/recbole/recbole.model.sequential_recommender.shan.rst
new file mode 100644
index 000000000..6ced74f83
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.shan.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.shan
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.srgnn.rst b/docs/source/recbole/recbole.model.sequential_recommender.srgnn.rst
new file mode 100644
index 000000000..76201dbff
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.srgnn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.srgnn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.stamp.rst b/docs/source/recbole/recbole.model.sequential_recommender.stamp.rst
new file mode 100644
index 000000000..eea454975
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.stamp.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.stamp
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.sequential_recommender.transrec.rst b/docs/source/recbole/recbole.model.sequential_recommender.transrec.rst
new file mode 100644
index 000000000..d4b44d0f8
--- /dev/null
+++ b/docs/source/recbole/recbole.model.sequential_recommender.transrec.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.sequential_recommender.transrec
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.quick_start.quick_start.rst b/docs/source/recbole/recbole.quick_start.quick_start.rst
new file mode 100644
index 000000000..da62b7bdc
--- /dev/null
+++ b/docs/source/recbole/recbole.quick_start.quick_start.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.quick_start.quick_start
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.sampler.sampler.rst b/docs/source/recbole/recbole.sampler.sampler.rst
new file mode 100644
index 000000000..30f94ef93
--- /dev/null
+++ b/docs/source/recbole/recbole.sampler.sampler.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.sampler.sampler
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.trainer.hyper_tuning.rst b/docs/source/recbole/recbole.trainer.hyper_tuning.rst
new file mode 100644
index 000000000..347f549e4
--- /dev/null
+++ b/docs/source/recbole/recbole.trainer.hyper_tuning.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.trainer.hyper_tuning
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.trainer.trainer.rst b/docs/source/recbole/recbole.trainer.trainer.rst
new file mode 100644
index 000000000..db0f69d84
--- /dev/null
+++ b/docs/source/recbole/recbole.trainer.trainer.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.trainer.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.utils.case_study.rst b/docs/source/recbole/recbole.utils.case_study.rst
new file mode 100644
index 000000000..3f6570eae
--- /dev/null
+++ b/docs/source/recbole/recbole.utils.case_study.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.utils.case_study
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.utils.enum_type.rst b/docs/source/recbole/recbole.utils.enum_type.rst
new file mode 100644
index 000000000..9d8483655
--- /dev/null
+++ b/docs/source/recbole/recbole.utils.enum_type.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.utils.enum_type
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.utils.logger.rst b/docs/source/recbole/recbole.utils.logger.rst
new file mode 100644
index 000000000..d3bd2975d
--- /dev/null
+++ b/docs/source/recbole/recbole.utils.logger.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.utils.logger
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.utils.utils.rst b/docs/source/recbole/recbole.utils.utils.rst
new file mode 100644
index 000000000..9e9fd62d3
--- /dev/null
+++ b/docs/source/recbole/recbole.utils.utils.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.utils.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/user_guide/config_settings.rst b/docs/source/user_guide/config_settings.rst
new file mode 100644
index 000000000..062391967
--- /dev/null
+++ b/docs/source/user_guide/config_settings.rst
@@ -0,0 +1,252 @@
+Config Settings
+===================
+RecBole is able to config different parameters for controlling the experiment
+setup (e.g., data processing, data splitting, training and evaluation).
+The users can select the settings according to their own requirements.
+
+The introduction of different parameter configurations are presented as follows:
+
+Parameters Introduction
+-----------------------------
+The parameters in RecBole can be divided into three categories:
+Basic Parameters, Dataset Parameters and Model Parameters.
+
+Basic Parameters
+^^^^^^^^^^^^^^^^^^^^^^
+Basic parameters are used to build the general environment including the settings for
+model training and evaluation.
+
+**Environment Setting**
+
+- ``gpu_id (int or str)`` : The id of GPU device. Defaults to ``0``.
+- ``use_gpu (bool)`` : Whether or not to use GPU. If True, using GPU, else using CPU.
+  Defaults to ``True``.
+- ``seed (int)`` : Random seed. Defaults to ``2020``.
+- ``state (str)`` : Logging level. Defaults to ``'INFO'``.
+  Range in ``['INFO', 'DEBUG', 'WARNING', 'ERROR', 'CRITICAL']``.
+- ``reproducibility (bool)`` : If True, the tool will use deterministic
+  convolution algorithms, which makes the result reproducible. If False,
+  the tool will benchmark multiple convolution algorithms and select the fastest one,
+  which makes the result not reproducible but can speed up model training in
+  some case. Defaults to ``True``.
+- ``data_path (str)`` : The path of input dataset. Defaults to ``'dataset/'``.
+- ``checkpoint_dir (str)`` : The path to save checkpoint file.
+  Defaults to ``'saved/'``.
+- ``show_progress (bool)`` : Show the progress of training epoch and evaluate epoch.
+  Defaults to ``True``.
+
+**Training Setting**
+
+- ``epochs (int)`` : The number of training epochs. Defaults to ``300``.
+- ``train_batch_size (int)`` : The training batch size. Defaults to ``2048``.
+- ``learner (str)`` : The name of used optimizer. Defaults to ``'adam'``.
+  Range in ``['adam', 'sgd', 'adagrad', 'rmsprop', 'sparse_adam']``.
+- ``learning_rate (float)`` : Learning rate. Defaults to ``0.001``.
+- ``training_neg_sample_num (int)`` : The number of negative samples during
+  training. If it is set to 0, the negative sampling operation will not be
+  performed. Defaults to ``1``.
+- ``training_neg_sample_distribution(str)`` : Distribution of the negative items
+  in training phase. Default to ``uniform``. Range in ``['uniform', 'popularity']``.
+- ``eval_step (int)`` : The number of training epochs before a evaluation
+  on the valid dataset. If it is less than 1, the model will not be
+  evaluated on the valid dataset. Defaults to ``1``.
+- ``stopping_step (int)`` : The threshold for validation-based early stopping.
+  Defaults to ``10``.
+- ``clip_grad_norm (dict)`` : The args of `clip_grad_norm_ <https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html>`_
+  which will clips gradient norm of model. Defaults to ``None``.
+- ``loss_decimal_place(int)``: The decimal place of training loss. Defaults to ``4``.
+- ``weight_decay (float)`` : Weight decay (L2 penalty), used for `optimizer <https://pytorch.org/docs/stable/optim.html?highlight=weight_decay>`_. Default to ``0.0``.
+
+
+**Evaluation Setting**
+
+- ``eval_setting (str)``: The evaluation settings. Defaults to ``'RO_RS,full'``.
+  The parameter has two parts. The first part control the splitting methods,
+  the range is ``['RO_RS','TO_LS','RO_LS','TO_RS']``. The second part(optional)
+  control the ranking mechanism, the range is ``['full','uni100','uni1000','pop100','pop1000']``.
+- ``group_by_user (bool)``: Whether or not to group the users.
+  It must be ``True`` when ``eval_setting`` is in ``['RO_LS', 'TO_LS']``.
+  Defaults to ``True``.
+- ``spilt_ratio (list)``: The split ratio between train data, valid data and
+  test data. It only take effects when the first part of ``eval_setting``
+  is in ``['RO_RS', 'TO_RS']``. Defaults to ``[0.8, 0.1, 0.1]``.
+- ``leave_one_num (int)``: It only take effects when the first part of
+  ``eval_setting`` is in ``['RO_LS', 'TO_LS']``. Defaults to ``2``.
+
+- ``metrics (list or str)``: Evaluation metrics. Defaults to
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC', 'GAUC',
+  'MAE', 'RMSE', 'LogLoss']``.
+- ``topk (list or int or None)``: The value of k for topk evaluation metrics.
+  Defaults to ``10``.
+- ``valid_metric (str)``: The evaluation metrics for early stopping. 
+  It must be one of used ``metrics``. Defaults to ``'MRR@10'``.
+- ``eval_batch_size (int)``: The evaluation batch size. Defaults to ``4096``.
+- ``metric_decimal_place(int)``: The decimal place of metric score. Defaults to ``4``.
+
+Pleaser refer to :doc:`evaluation_support` for more details about the parameters
+in Evaluation Setting.
+
+Dataset Parameters
+^^^^^^^^^^^^^^^^^^^^^^^
+Dataset Parameters are used to describe the dataset information and control
+the dataset loading and filtering.
+
+Please refer to :doc:`data/data_args` for more details.
+
+Model Parameters
+^^^^^^^^^^^^^^^^^^^^^
+Model Parameters are used to describe the model structures.
+
+Please refer to :doc:`model_intro` for more details.
+
+
+Parameters Configuration
+------------------------------
+RecBole supports three types of parameter configurations: Config files,
+Parameter Dicts and Command Line. The parameters are assigned via the
+Configuration module.
+
+Config Files
+^^^^^^^^^^^^^^^^
+Config Files should be organized in the format of yaml.
+The users should write their parameters according to the rules aligned with
+yaml, and the final config files are processed by the configuration module
+to complete the parameter settings.
+
+To begin with, we write the parameters into the yaml files (e.g. `example.yaml`).
+
+.. code:: yaml
+
+    gpu_id: 1
+    training_batch_size: 1024
+
+Then, the yaml files are conveyed to the configuration module to finish the
+parameter settings.
+
+.. code:: python
+
+    from recbole.config import Config
+
+    config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml'])
+    print('gpu_id: ', config['gpu_id'])
+    print('training_batch_size: ', config['training_batch_size'])
+
+
+output:
+
+.. code:: bash
+
+    gpu_id: 1
+    training_batch_size: 1024
+
+The parameter ``config_file_list`` supports multiple yaml files.
+
+For more details on yaml, please refer to YAML_.
+
+.. _YAML: https://yaml.org/
+
+When using our toolkit, the parameters belonging to **Dataset parameters** and
+Evaluation Settings of **Basic Parameters** are recommended to be written into
+the config files, which may be convenient for reusing the configurations.
+
+Parameter Dicts
+^^^^^^^^^^^^^^^^^^
+Parameter Dict is realized by the dict data structure in python, where the key
+is the parameter name, and the value is the parameter value. The users can write their
+parameters into a dict, and input it into the configuration module.
+
+An example is as follows:
+
+.. code:: python
+
+    from recbole.config import Config
+
+    parameter_dict = {
+        'gpu_id': 2,
+        'training_batch_size': 512
+    }
+    config = Config(model='BPR', dataset='ml-100k', config_dict=parameter_dict)
+    print('gpu_id: ', config['gpu_id'])
+    print('training_batch_size: ', config['training_batch_size'])
+
+output:
+
+.. code:: bash
+
+    gpu_id: 2
+    training_batch_size: 512
+
+
+Command Line
+^^^^^^^^^^^^^^^^^^^^^^^^
+We can also assign parameters based on the command line.
+The parameters in the command line can be read from the configuration module.
+The format is: `-–parameter_name=[parameter_value]`.
+
+Write the following code to the python file (e.g. `run.py`):
+
+.. code:: python
+
+    from recbole.config import Config
+
+    config = Config(model='BPR', dataset='ml-100k')
+    print('gpu_id: ', config['gpu_id'])
+    print('training_batch_size: ', config['training_batch_size'])
+
+Running:
+
+.. code:: bash
+
+    python run.py --gpu_id=3 --training_batch_size=256
+
+output:
+
+.. code:: bash
+
+    gpu_id: 3
+    training_batch_size: 256
+
+
+Priority
+^^^^^^^^^^^^^^^^^
+RecBole supports the combination of three types of parameter configurations.
+
+The priority of the configuration methods is: Command Line > Parameter Dicts
+> Config Files > Default Settings
+
+A example is as follows:
+
+`example.yaml`:
+
+.. code:: yaml
+
+    gpu_id: 1
+    training_batch_size: 1024
+
+`run.py`:
+
+.. code:: python
+
+    from recbole.config import Config
+
+    parameter_dict = {
+        'gpu_id': 2,
+        'training_batch_size': 512
+    }
+    config = Config(model='BPR', dataset='ml-100k', config_file_list=['example.yaml'], config_dict=parameter_dict)
+    print('gpu_id: ', config['gpu_id'])
+    print('training_batch_size: ', config['training_batch_size'])
+
+Running:
+
+.. code:: bash
+
+    python run.py --gpu_id=3 --training_batch_size=256
+
+output:
+
+.. code:: bash
+
+    gpu_id: 3
+    training_batch_size: 256
diff --git a/docs/source/user_guide/data/atomic_files.rst b/docs/source/user_guide/data/atomic_files.rst
new file mode 100644
index 000000000..9e74dd4d7
--- /dev/null
+++ b/docs/source/user_guide/data/atomic_files.rst
@@ -0,0 +1,144 @@
+Atomic Files
+===================
+
+Atomic files are introduced to format the input of mainstream recommendation tasks in a flexible way.
+
+So far, our library introduces six atomic file types, and we identify different files by their suffixes.
+
+=========  ==============================  ========================================================
+Suffix        Content                             Example Format
+=========  ==============================  ========================================================
+`.inter`   User-item interaction             `user_id`, `item_id`, `rating`, `timestamp`, `review`
+`.user`    User feature                      `user_id`, `age`, `gender`
+`.item`    Item feature                      `item_id`, `category`
+`.kg`      Triplets in a knowledge graph     `head_entity`, `tail_entity`, `relation`
+`.link`    Item-entity linkage data          `entity`, `item_id`
+`.net`     Social graph data                 `source`, `target`
+=========  ==============================  ========================================================
+
+Atomic files are combined to support the input of different recommendation tasks.
+
+One can write the suffixes into the config arg ``load_col`` to load the corresponding atomic files.
+
+For each recommendation task, we have to provide several mandatory files:
+
+================              ================================
+Tasks                             Mandatory atomic files
+================              ================================
+General                         `.inter`
+Context-aware                   `.inter`, `.user`, `.item`
+Knowledge-aware                 `.inter`, `.kg`, `.link`
+Sequential                      `.inter`
+Social                          `.inter`, `.net`
+================              ================================
+
+Format
+--------
+
+Each atomic file can be viewed as a m x n table, where n is the number of features and m-1 is the number of data records(one line for header).
+
+The first row corresponds to feature names, in which each entry has the form of ``feat_name:feat_type``，indicating the feature name and feature type.
+
+We support four feature types, which can be processed by tensors in batch.
+
+============   ===========================   =====================
+feat_type        Explanations                 Examples
+============   ===========================   =====================
+`token`        single discrete feature        `user_id`, `age`
+`token_seq`    discrete features sequence     `review`
+`float`        single continuous feature      `rating`, `timestamp`
+`float_seq`    continuous feature sequence    `vector`
+============   ===========================   =====================
+
+Examples
+----------
+
+We present three example data rows in the formatted ML-1M dataset.
+
+**ml-1m.inter**
+
+=============   =============   ============   ===============
+user_id:token   item_id:token   rating:float   timestamp:float
+=============   =============   ============   ===============
+1               1193            5              978300760
+1               661             3              978302109
+=============   =============   ============   ===============
+
+**ml-1m.user**
+
+=============   =========   ============   ================   ==============
+user_id:token   age:token   gender:token   occupation:token   zip_code:token
+=============   =========   ============   ================   ==============
+1               1           F              10                 48067
+2               56          M              16                 70072
+=============   =========   ============   ================   ==============
+
+**ml-1m.item**
+
+=============   =====================   ==================   ============================
+item_id:token   movie_title:token_seq   release_year:token   genre:token_seq
+=============   =====================   ==================   ============================
+1               Toy Story               1995                 Animation Children's Comedy
+2               Jumanji                 1995                 Adventure Children's Fantasy
+=============   =====================   ==================   ============================
+
+**ml-1m.kg**
+
+=============   ===================================   =============
+head_id:token   relation_id:token                     tail_id:token
+=============   ===================================   =============
+m.0gs6m         film.film_genre.films_in_this_genre   m.01b195
+m.052_dz        film.film.actor                       m.02nrdp
+=============   ===================================   =============
+
+**ml-1m.link**
+
+=============   ===============
+item_id:token   entity_id:token
+=============   ===============
+2694            m.02hxhz
+2079            m.0kvcr9
+=============   ===============
+
+Additional Atomic Files
+----------------------------
+
+For users who want to load features from additional atomic files (e.g. pretrained entity embeddings), we provide a simple way as following.
+
+Firstly, prepare your additional atomic file (e.g. ``ml-1m.ent``).
+
+=============   ===============================
+ent_id:token    ent_emb:float_seq
+=============   ===============================
+m.0gs6m         -115.08 13.60 113.69
+m.01b195        -130.97 263.05 -129.88
+=============   ===============================
+
+Secondly, update the args as:
+
+.. code:: yaml
+
+    additional_feat_suffix: [ent]
+    load_col:
+        # inter/user/item/...: As usual
+        ent: [ent_id, ent_emb]
+
+Then, this additional atomic file will be loaded into the :class:`Dataset` object. These new features can be used as following.
+
+.. code:: python
+
+    dataset = create_dataset(config)
+    print(dataset.ent_feat)
+
+Note that these features can be preprocessed by the same way as the other features.
+
+For example, if you want to map the tokens of ``ent_id`` into the same space of ``entity_id``, then update the args as:
+
+.. code:: yaml
+
+    additional_feat_suffix: [ent]
+    load_col:
+        # inter/user/item/...: As usual
+        ent: [ent_id, ent_emb]
+
+    fields_in_same_space: [[ent_id, entity_id]]
diff --git a/docs/source/user_guide/data/data_args.rst b/docs/source/user_guide/data/data_args.rst
new file mode 100644
index 000000000..497eb65e6
--- /dev/null
+++ b/docs/source/user_guide/data/data_args.rst
@@ -0,0 +1,104 @@
+Args for Data
+=========================
+
+RecBole provides several arguments for describing:
+
+- Basic information of the dataset
+- Operations of dataset preprocessing
+
+See below for the details:
+
+Atomic File Format
+----------------------
+
+- ``field_separator (str)`` : Separator of different columns in atomic files. Defaults to ``"\t"``.
+- ``seq_separator (str)`` : Separator inside the sequence features. Defaults to ``" "``.
+
+Basic Information
+----------------------
+
+Common Features
+''''''''''''''''''
+
+- ``USER_ID_FIELD (str)`` : Field name of user ID feature. Defaults to ``user_id``.
+- ``ITEM_ID_FIELD (str)`` : Field name of item ID feature. Defaults to ``item_id``.
+- ``RATING_FIELD (str)`` : Field name of rating feature. Defaults to ``rating``.
+- ``TIME_FIELD (str)`` : Field name of timestamp feature. Defaults to ``timestamp``.
+- ``seq_len (dict)`` : Keys are field names of sequence features, values are maximum length of each sequence (which means sequences too long will be cut off). If not set, the sequences will not be cut off. Defaults to ``None``.
+
+Label for Point-wise DataLoader
+'''''''''''''''''''''''''''''''''''
+
+- ``LABEL_FIELD (str)`` : Expected field name of the generated labels. Defaults to ``label``.
+- ``threshold (dict)`` : The format is ``{k (str): v (float)}``. 0/1 labels will be generated according to the value of ``inter_feat[k]`` and ``v``. The rows with ``inter_feat[k] >= v`` will be labeled as positive, otherwise the label is negative. Note that at most one pair of ``k`` and ``v`` can exist in ``threshold``. Defaults to ``None``.
+
+NegSample Prefix for Pair-wise DataLoader
+''''''''''''''''''''''''''''''''''''''''''''''''''
+
+- ``NEG_PREFIX (str)`` : Prefix of field names which are generated as negative cases. E.g. if we have positive item ID named ``item_id``, then those item ID in negative samples will be called ``NEG_PREFIX + item_id``. Defaults to ``neg_``.
+
+Sequential Model Needed
+'''''''''''''''''''''''''''''''''''
+
+- ``ITEM_LIST_LENGTH_FIELD (str)`` : Field name of the feature representing item sequences' length. Defaults to ``item_length``.
+- ``LIST_SUFFIX (str)`` : Suffix of field names which are generated as sequences. E.g. if we have item ID named ``item_id``, then those item ID sequences will be called ``item_id + LIST_SUFFIX``. Defaults to ``_list``.
+- ``MAX_ITEM_LIST_LENGTH (int)``: Maximum length of each generated sequence. Defaults to ``50``.
+- ``POSITION_FIELD (str)`` : Field name of the generated position sequence. For sequence of length ``k``, its position sequence is ``range(k)``. Note that this field will only be generated if this arg is not ``None``. Defaults to ``position_id``.
+
+Knowledge-based Model Needed
+'''''''''''''''''''''''''''''''''''
+
+- ``HEAD_ENTITY_ID_FIELD (str)`` : Field name of the head entity ID feature. Defaults to ``head_id``.
+- ``TAIL_ENTITY_ID_FIELD (str)`` : Field name of the tail entity ID feature. Defaults to ``tail_id``.
+- ``RELATION_ID_FIELD (str)`` : Field name of the relation ID feature. Defaults to ``relation_id``.
+- ``ENTITY_ID_FIELD (str)`` : Field name of the entity ID. Note that it's only a symbol of entities, not real feature of one of the ``xxx_feat``. Defaults to ``entity_id``.
+
+Selectively Loading
+------------------------------
+
+- ``load_col (dict)`` : Keys are the suffix of loaded atomic files, values are the list of field names to be loaded. If a suffix doesn't exist in ``load_col``, the corresponding atomic file will not be loaded. Note that if ``load_col`` is ``None``, then all the existed atomic files will be loaded. Defaults to ``{inter: [user_id, item_id]}``.
+- ``unload_col (dict)`` : Keys are suffix of loaded atomic files, values are list of field names NOT to be loaded. Note that ``load_col`` and ``unload_col`` can not be set at the same time. Defaults to ``None``.
+- ``unused_col (dict)`` : Keys are suffix of loaded atomic files, values are list of field names which is loaded for data processing but will not used in model. E.g. the ``time_field`` may used for time ordering but model does not use this field. Defaults to ``None``.
+- ``additional_feat_suffix (list)``: Control loading additional atomic files. E.g. if you want to load features from ``ml-100k.hello``, just set this arg as ``additional_feat_suffix: [hello]``. Features of additional features will be stored in ``Dataset.feat_list``. Defaults to ``None``.
+
+Filtering
+-----------
+
+Remove duplicated user-item interactions
+''''''''''''''''''''''''''''''''''''''''
+
+- ``rm_dup_inter (str)`` : Whether to remove duplicated user-item interactions. If ``time_field`` exists, ``inter_feat`` will be sorted by ``time_field`` in ascending order. Otherwise it will remain unchanged. After that, if ``rm_dup_inter ==  first``, we will keep the first user-item interaction in duplicates; if ``rm_dup_inter ==  last``, we will keep the last user-item interaction in duplicates. Defaults to ``None``.
+
+Filter by value
+''''''''''''''''''
+
+- ``lowest_val (dict)`` : Has the format ``{k (str): v (float)}, ...``. The rows whose ``feat[k] < v`` will be filtered. Defaults to ``None``.
+- ``highest_val (dict)`` : Has the format ``{k (str): v (float)}, ...``. The rows whose ``feat[k] > v`` will be filtered. Defaults to ``None``.
+- ``equal_val (dict)`` : Has the format ``{k (str): v (float)}, ...``. The rows whose ``feat[k] != v`` will be filtered. Defaults to ``None``.
+- ``not_equal_val (dict)`` : Has the format ``{k (str): v (float)}, ...``. The rows whose ``feat[k] == v`` will be filtered. Defaults to ``None``.
+
+Remove interation by user or item
+'''''''''''''''''''''''''''''''''''
+
+- ``filter_inter_by_user_or_item (bool)`` : If ``True``, we will remove the interaction in ``inter_feat`` which user or item is not in ``user_feat`` or ``item_feat``. Defaults to ``True``.
+
+Filter by number of interactions
+''''''''''''''''''''''''''''''''''''
+
+- ``max_user_inter_num (int)`` : Users whose number of interactions is more than ``max_user_inter_num`` will be filtered. Defaults to ``None``.
+- ``min_user_inter_num (int)`` : Users whose number of interactions is less than ``min_user_inter_num`` will be filtered. Defaults to ``0``.
+- ``max_item_inter_num (int)`` : Items whose number of interactions is more than ``max_item_inter_num`` will be filtered. Defaults to ``None``.
+- ``min_item_inter_num (int)`` : Items whose number of interactions is less than ``min_item_inter_num`` will be filtered. Defaults to ``0``.
+
+Preprocessing
+-----------------
+
+- ``fields_in_same_space (list)`` : List of spaces. Space is a list of string similar to the fields' names. The fields in the same space will be remapped into the same index system. Note that if you want to make some fields remapped in the same space with entities, then just set ``fields_in_same_space = [entity_id, xxx, ...]``. (if ``ENTITY_ID_FIELD != 'entity_id'``, then change the ``'entity_id'`` in the above example.) Defaults to ``None``.
+- ``preload_weight (dict)`` : Has the format ``{k (str): v (float)}, ...``. ``k`` if a token field, representing the IDs of each row of preloaded weight matrix. ``v`` is a float like fields. Each pair of ``u`` and ``v`` should be from the same atomic file. This arg can be used to load pretrained vectors. Defaults to ``None``.
+- ``normalize_field (list)`` : List of filed names to be normalized. Note that only float like fields can be normalized. Defaults to ``None``.
+- ``normalize_all (bool)`` : Normalize all the float like fields if ``True``. Defaults to ``True``.
+
+Benchmark file
+-------------------
+
+- ``benchmark_filename (list)`` : List of pre-split user-item interaction suffix. We will only apply normalize, remap-id, which will not delete the interaction in inter_feat. And then split the inter_feat by ``benchmark_filename``. E.g. Let's assume that the dataset is called ``click``, and ``benchmark_filename`` equals to ``['part1', 'part2', 'part3']``. That we will load ``click.part1.inter``, ``click.part2.inter``, ``click.part3.inter``, and treat them as train, valid, test dataset. Defaults to ``None``.
diff --git a/docs/source/user_guide/data/data_flow.rst b/docs/source/user_guide/data/data_flow.rst
new file mode 100644
index 000000000..75022e575
--- /dev/null
+++ b/docs/source/user_guide/data/data_flow.rst
@@ -0,0 +1,28 @@
+Data Flow
+===========
+
+For extensibility and reusability, our data module designs an elegant data flow that transforms raw data into the model input.
+
+The overall data flow can be described as follows:
+
+.. image:: ../../asset/data_flow_en.png
+    :align: center
+
+The details are as follows:
+
+- Raw Input
+    Unprocessed raw input dataset. Detailed as `Dataset List </dataset_list.html>`_.
+- Atomic Files
+    Basic components for characterizing the input of various recommendation tasks, proposed by RecBole. Detailed as :doc:`atomic_files`.
+- Dataset:
+    Mainly based on the primary data structure of :class:`pandas.DataFrame` in the library of `pandas <https://pandas.pydata.org/>`_.
+    During the transformation step from atomic files to class :class:`Dataset`,
+    we provide many useful functions that support a series of preprocessing functions in recommender systems,
+    such as k-core data filtering and missing value imputation.
+- DataLoader:
+    Mainly based on a general internal data structure implemented by our library, called :class:`~recbole.data.interaction.Interaction`.
+    :class:`~recbole.data.interaction.Interaction` is the internal data structural that is fed into the recommendation algorithms.
+    It is implemented as a new abstract data type based on :class:`python.Dict`, which is a key-value indexed data structure.
+    The keys correspond to features from input, which can be conveniently referenced with feature names when writing the recommendation algorithms;
+    and the values correspond to tensors (implemented by :class:`torch.Tensor`), which will be used for the update and computation in learning algorithms.
+    Specially, the value entry for a specific key stores all the corresponding tensor data in a batch or mini-batch.
diff --git a/docs/source/user_guide/data/interaction.rst b/docs/source/user_guide/data/interaction.rst
new file mode 100644
index 000000000..43e5c0356
--- /dev/null
+++ b/docs/source/user_guide/data/interaction.rst
@@ -0,0 +1,29 @@
+Interaction
+================
+
+:class:`~recbole.data.interaction.Interaction` is the internal data structural that is loaded from :class:`DataLoader`, and fed into the recommendation algorithms.
+
+It is implemented as a new abstract data type based on :class:`python.dict`. The keys correspond to features from input, which can be conveniently referenced with feature names when writing the recommendation algorithms; and the values correspond to tensors (implemented by :class:`torch.Tensor`), which will be used for the update and computation in learning algorithms. Specially, the value entry for a specific key stores all the corresponding tensor data in a batch or mini-batch.
+
+With such a data structure, our library provides a friendly interface to write the recommendation algorithms in a batch-based mode. For example, we can read all the user embeddings and items embeddings from an instantiated :class:`~recbole.data.interaction.Interaction` object ``inter`` simply based on the feature names:
+
+.. code:: python
+
+    user_vec = inter['UserID']
+    item_vec = inter['ItemID']
+
+The contents of an :class:`~recbole.data.interaction.Interaction` are decided by the loaded fields.
+However, it should be noted that there can be some features generated by :class:`DataLoader`, e.g. if one model has ``input_type = InputType.PAIRWISE``, then each item feature has a corresponding negative item feature, whose keys are begin with arg ``NEG_PREFIX``.
+
+Besides, the value components are implemented based on :class:`torch.Tensor`. We wrap many functions of PyTorch to develop a GRU-oriented data structure, which can support batch-based mechanism (e.g., copying a batch of data to GPU). In specific, we summarize the important functions as follows:
+
+============================         ==================================================================
+Function                             Description
+============================         ==================================================================
+to(device)                           transfer all tensors to :class:`torch.device`
+cpu                                  transfer all tensors to CPU
+numpy                                transfer all tensors to :class:`numpy.ndarray`
+repeat                               repeats each tensor along the batch size dimension
+repeat interleave                    repeat elements of a tensor, similar to repeat interleave
+update                               update this object with another Interaction, similar to update
+============================         ==================================================================
diff --git a/docs/source/user_guide/data_intro.rst b/docs/source/user_guide/data_intro.rst
new file mode 100644
index 000000000..f84d2c04e
--- /dev/null
+++ b/docs/source/user_guide/data_intro.rst
@@ -0,0 +1,12 @@
+Data Introduction
+===================
+
+Here we introduce the whole dataflow and highlight its key features.
+
+.. toctree::
+   :maxdepth: 1
+
+   data/data_flow
+   data/atomic_files
+   data/interaction
+   data/data_args
diff --git a/docs/source/user_guide/evaluation_support.rst b/docs/source/user_guide/evaluation_support.rst
new file mode 100644
index 000000000..39cc2167c
--- /dev/null
+++ b/docs/source/user_guide/evaluation_support.rst
@@ -0,0 +1,65 @@
+Evaluation Support
+===========================
+
+The function of evaluation module is to implement commonly used evaluation
+protocols for recommender systems. Since different models can be compared under
+the same evaluation modules, RecBole standardizes the evaluation of recommender
+systems.
+
+
+Evaluation Settings
+-----------------------
+The evaluation settings supported by RecBole is as following. Among them, the
+first four rows correspond to the dataset splitting methods, while the last two
+rows correspond to the ranking mechanism, namely a full ranking over all the
+items or a sampled-based ranking.
+
+==================       ========================================================
+ Notation                   Explanation
+==================       ========================================================
+  RO_RS                     Random Ordering + Ratio-based Splitting
+  TO_LS                     Temporal Ordering + Leave-one-out Splitting
+  RO_LS                     Random Ordering + Leave-one-out Splitting
+  TO_RS                     Temporal Ordering + Ratio-based Splitting
+  full                      full ranking with all item candidates
+  uniN                      sample-based ranking: each positive item is paired with N sampled negative items in uniform distribution
+  popN                      sample-based ranking: each positive item is paired with N sampled negative items in popularity distribution
+==================       ========================================================
+
+The parameters used to control the evaluation settings are as follows:
+
+- ``eval_setting (str)``: The evaluation settings. Defaults to ``'RO_RS,full'``.
+  The parameter has two parts. The first part control the splitting methods,
+  range in ``['RO_RS','TO_LS','RO_LS','TO_RS']``. The second part(optional)
+  control the ranking mechanism, range in ``['full','uni100','uni1000','pop100','pop1000']``.
+- ``group_by_user (bool)``: Whether the users are grouped.
+  It must be ``True`` when ``eval_setting`` is in ``['RO_LS', 'TO_LS']``.
+  Defaults to ``True``.
+- ``spilt_ratio (list)``: The split ratio between train data, valid data and
+  test data. It only take effects when the first part of ``eval_setting``
+  is in ``['RO_RS', 'TO_RS']``. Defaults to ``[0.8, 0.1, 0.1]``.
+- ``leave_one_num (int)``: It only take effects when the first part of
+  ``eval_setting`` is in ``['RO_LS', 'TO_LS']``. Defaults to ``2``.
+
+Evaluation Metrics
+-----------------------
+
+RecBole supports both value-based and ranking-based evaluation metrics.
+
+The value-based metrics (i.e., for rating prediction) include ``RMSE``, ``MAE``,
+``AUC`` and ``LogLoss``, measuring the prediction difference between the true
+and predicted values.
+
+The ranking-based metrics (i.e., for top-k item recommendation) include the most
+common ranking-aware metrics, such as ``Recall``, ``Precision``, ``Hit``,
+``NDCG``, ``MAP`` and ``MRR``, measuring the ranking performance of the
+generated recommendation lists by an algorithm.
+
+The parameters used to control the evaluation metrics are as follows:
+
+- ``metrics (list or str)``: Evaluation metrics. Defaults to
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC',
+  'MAE', 'RMSE', 'LogLoss']``.
+- ``topk (list or int or None)``: The value of k for topk evaluation metrics.
+  Defaults to ``10``.
diff --git a/docs/source/user_guide/model/context/afm.rst b/docs/source/user_guide/model/context/afm.rst
new file mode 100644
index 000000000..b67e6fa43
--- /dev/null
+++ b/docs/source/user_guide/model/context/afm.rst
@@ -0,0 +1,73 @@
+AFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.5555/3172077.3172324>`_
+
+**Title:** Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
+
+**Authors:** Jun Xiao, Hao Ye, Xiangnan He,  Hanwang Zhang,  Fei Wu,  Tat-Seng Chua
+
+**Abstract:**  *Factorization Machines* (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named *Attentional Factorization Machine* (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a 8.6% relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep [Cheng *et al.* , 2016] and Deep-Cross [Shan *et al.* , 2016] with a much simpler structure and fewer model parameters.
+
+.. image:: ../../../asset/afm.jpg
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``attention_size (int)`` : The vector size in attention mechanism. Defaults to ``25``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.3``.
+- ``weight_decay (float)`` : The L2 regularization weight. Defaults to ``2``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='AFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   attention_size choice [10,15,20,25,30,40]
+   reg_weight choice [0,0.1,0.2,1,2,5,10]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/autoint.rst b/docs/source/user_guide/model/context/autoint.rst
new file mode 100644
index 000000000..664bdb281
--- /dev/null
+++ b/docs/source/user_guide/model/context/autoint.rst
@@ -0,0 +1,75 @@
+AutoInt
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3357384.3357925>`_
+
+**Title:** AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
+
+**Authors:** Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, Jian Tang
+
+**Abstract:**  Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (a.k.a. cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the AutoInt to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability.
+
+.. image:: ../../../asset/autoint.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``attention_size (int)`` : The vector size in attention mechanism. Defaults to ``16``.
+- ``n_layers (int)`` : The number of attention layers. Defaults to ``3``.
+- ``num_heads (int)`` : The number of attention heads. Defaults to ``2``.
+- ``dropout_probs (list of float)`` : The dropout rate of dropout layer. Defaults to ``[0.2,0.2,0.2]``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[128,128]``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='AutoInt', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   attention_size choice [8,16,32]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[64,64]','[128,128]','[256,256]','[512,512]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/dcn.rst b/docs/source/user_guide/model/context/dcn.rst
new file mode 100644
index 000000000..49206b533
--- /dev/null
+++ b/docs/source/user_guide/model/context/dcn.rst
@@ -0,0 +1,91 @@
+DCN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3124749.3124754>`_
+
+**Title:** Deep & Cross Network for Ad Click Predictions
+
+**Authors:** Ruoxi Wang, Bin Fu, Gang Fu, Mingliang Wang
+
+**Abstract:** Feature engineering has been the key to the success of many prediction
+models. However, the process is nontrivial and oen requires
+manual feature engineering or exhaustive searching. DNNs
+are able to automatically learn feature interactions; however, they
+generate all the interactions implicitly, and are not necessarily efficient
+in learning all types of cross features. In this paper, we propose
+the Deep & Cross Network (DCN) which keeps the benefits of
+a DNN model, and beyond that, it introduces a novel cross network
+that is more efficient in learning certain bounded-degree feature
+interactions. In particular, DCN explicitly applies feature crossing
+at each layer, requires no manual feature engineering, and adds
+negligible extra complexity to the DNN model. Our experimental
+results have demonstrated its superiority over the state-of-art algorithms
+on the CTR prediction dataset and dense classification
+dataset, in terms of both model accuracy and memory usage.
+
+.. image:: ../../../asset/dcn.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[256,256,256]``.
+- ``cross_layer_num (int)`` : The number of cross layers. Defaults to ``6``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``2``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.2``.
+
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DCN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]','[1024, 1024]']
+   reg_weight choice [0.1,1,2,5,10]
+   cross_layer_num choice [3,4,5,6]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/context/deepfm.rst b/docs/source/user_guide/model/context/deepfm.rst
new file mode 100644
index 000000000..bf6817b24
--- /dev/null
+++ b/docs/source/user_guide/model/context/deepfm.rst
@@ -0,0 +1,72 @@
+DeepFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.5555/3172077.3172127>`_
+
+**Title:** DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
+
+**Authors:** Huifeng Guo , Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He
+
+**Abstract:**  Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide \& Deep model from Google, DeepFM has a shared input to its "wide" and "deep" parts, with no need of feature engineering besides raw features. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of DeepFM over the existing models for CTR prediction, on both benchmark data and commercial data.
+
+.. image:: ../../../asset/deepfm.png
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[128,128,128]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.2``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DeepFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/context/din.rst b/docs/source/user_guide/model/context/din.rst
new file mode 100644
index 000000000..90018241d
--- /dev/null
+++ b/docs/source/user_guide/model/context/din.rst
@@ -0,0 +1,100 @@
+DIN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3219819.3219823>`_
+
+**Title:** Deep Interest Network for Click-Through Rate Prediction
+
+**Authors:** Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Ying Fan, Han Zhu, Xiao Ma,
+Yanghui Yan, Junqi Jin, Han Li, Kun Gai
+
+**Abstract:** Click-through rate prediction is an essential task in industrial
+applications, such as online advertising. Recently deep learning
+based models have been proposed, which follow a similar Embedding&
+MLP paradigm. In these methods large scale sparse input
+features are first mapped into low dimensional embedding vectors,
+and then transformed into fixed-length vectors in a group-wise
+manner, finally concatenated together to fed into a multilayer perceptron
+(MLP) to learn the nonlinear relations among features. In
+this way, user features are compressed into a fixed-length representation
+vector, in regardless of what candidate ads are. The use
+of fixed-length vector will be a bottleneck, which brings difficulty
+for Embedding&MLP methods to capture user’s diverse interests
+effectively from rich historical behaviors. In this paper, we propose
+a novel model: Deep Interest Network (DIN) which tackles this challenge
+by designing a local activation unit to adaptively learn the
+representation of user interests from historical behaviors with respect
+to a certain ad. This representation vector varies over different
+ads, improving the expressive ability of model greatly. Besides, we
+develop two techniques: mini-batch aware regularization and data
+adaptive activation function which can help training industrial deep
+networks with hundreds of millions of parameters. Experiments on
+two public datasets as well as an Alibaba real production dataset
+with over 2 billion samples demonstrate the effectiveness of proposed
+approaches, which achieve superior performance compared
+with state-of-the-art methods. DIN now has been successfully deployed
+in the online display advertising system in Alibaba, serving
+the main traffic.
+
+.. image:: ../../../asset/din.png
+    :width: 1000
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[256,256,256]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.0``.
+- ``pooling_mode (str)`` : Pooling mode of sequence data. Defaults to ``'mean'``. Range in ``['max', 'mean', 'sum']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DIN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']
+   pooling_mode choice ['mean','max','sum']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/context/dssm.rst b/docs/source/user_guide/model/context/dssm.rst
new file mode 100644
index 000000000..55da708e4
--- /dev/null
+++ b/docs/source/user_guide/model/context/dssm.rst
@@ -0,0 +1,78 @@
+DSSM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2505515.2505665>`_
+
+**Title:** Learning deep structured semantic models for web search using clickthrough data
+
+**Authors:** Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck
+
+**Abstract:** Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails. In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them. The proposed deep structured semantic models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data.
+
+To make our models applicable to large-scale Web search applications, we also use a technique called word hashing, which is shown to effectively scale up our semantic models to handle large vocabularies which are common in such tasks. The new models are evaluated on a Web document ranking task using a real-world data set. Results show that our best model significantly outperforms other latent semantic models, which were considered state-of-the-art in the performance prior to the work presented in this paper.
+
+.. image:: ../../../asset/dssm.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[256, 256, 256]``.
+- ``dropout_prob (float)`` : The dropout rate of edge in the linear predict layer. Defaults to ``0.3``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DSSM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+ - DSSM requires user-side and item-side features.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/context/ffm.rst b/docs/source/user_guide/model/context/ffm.rst
new file mode 100644
index 000000000..95119ae48
--- /dev/null
+++ b/docs/source/user_guide/model/context/ffm.rst
@@ -0,0 +1,73 @@
+FFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2959100.2959134>`_
+
+**Title:** Field-aware Factorization Machines for CTR Prediction
+
+**Authors:** Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, Chih-Jen Lin
+
+**Abstract:**  Click-through rate (CTR) prediction plays an important role in computational advertising. Models based on degree-2 polynomial mappings and factorization machines (FMs) are widely used for this task. Recently, a variant of FMs, field-aware factorization machines (FFMs), outperforms existing models in some world-wide CTR-prediction competitions. Based on our experiences in winning two of them, in this paper we establish FFMs as an effective method for classifying large sparse data including those from CTR prediction. First, we propose efficient implementations for training FFMs. Then we comprehensively analyze FFMs and compare this approach with competing models. Experiments show that FFMs are very useful for certain classification problems. Finally, we have released a package of FFMs for public use.
+
+.. image:: ../../../asset/ffm.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``fields (dict)`` : This parameter defines the mapping from fields to features, key is field's id, value is a list of features in this field. For example, in ml-100k dataset, it can be set as ``{0: ['user_id','age'], 1: ['item_id', 'class']}``. If it is set to ``None``, the features and the fields are corresponding one-to-one. Defaults to ``None``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- The features defined in ``fields`` must be in the dataset and be loaded by data module in RecBole. It means the value in ``fields`` must appear in ``load_col``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/context/fm.rst b/docs/source/user_guide/model/context/fm.rst
new file mode 100644
index 000000000..6565fac29
--- /dev/null
+++ b/docs/source/user_guide/model/context/fm.rst
@@ -0,0 +1,67 @@
+FM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://ieeexplore.ieee.org/abstract/document/5694074/>`_
+
+**Title:** Factorization Machines
+
+**Authors:** Steffen Rendle
+
+**Abstract:**  In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings. On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models.
+
+.. image:: ../../../asset/fm.png
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/fnn.rst b/docs/source/user_guide/model/context/fnn.rst
new file mode 100644
index 000000000..358287123
--- /dev/null
+++ b/docs/source/user_guide/model/context/fnn.rst
@@ -0,0 +1,71 @@
+FNN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://link.springer.com/chapter/10.1007/978-3-319-30671-1_4>`_
+
+**Title:** Deep Learning over Multi-field Categorical Data
+
+**Authors:** Weinan Zhang, Tianming Du, and Jun Wang
+
+**Abstract:**  Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features that we usually found in the image and audio domains, the input features in web space are always of multi-field and are mostly discrete and categorical while their dependencies are little known. Major user response prediction models have to either limit themselves to linear models or require manually building up high-order combination features. The former loses the ability of exploring feature interactions, while the latter results in a heavy computation in the large feature space. To tackle the issue, we propose two novel models using deep neural networks (DNNs) to automatically learn effective patterns from categorical feature interactions and make predictions of users’ ad clicks. To get our DNNs efficiently work, we propose to leverage three feature transformation methods, i.e., factorisation machines (FMs), restricted Boltzmann machines (RBMs) and denoising auto-encoders (DAEs). This paper presents the structure of our models and their efficient training algorithms. The large-scale experiments with real-world data demonstrate that our methods work better than major state-of-the-art models.
+
+.. image:: ../../../asset/fnn.png
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[256,256,256]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.2``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FNN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size in ['[128,256,128]','[128,128,128]','[64,128,64]','[256,256,256]']
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/context/fwfm.rst b/docs/source/user_guide/model/context/fwfm.rst
new file mode 100644
index 000000000..15c41be3e
--- /dev/null
+++ b/docs/source/user_guide/model/context/fwfm.rst
@@ -0,0 +1,74 @@
+FwFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3178876.3186040>`_
+
+**Title:** Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising
+
+**Authors:** Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, Quan Lu
+
+**Abstract:**  Click-through rate (CTR) prediction is a critical task in online display advertising. The data involved in CTR prediction are typically multi-field categorical data, i.e., every feature is categorical and belongs to one and only one field. One of the interesting characteristics of such data is that features from one field often interact differently with features from different other fields. Recently, Field-aware Factorization Machines (FFMs) have been among the best performing models for CTR prediction by explicitly modeling such difference. However, the number of parameters in FFMs is in the order of feature number times field number, which is unacceptable in the real-world production systems. In this paper, we propose Field-weighted Factorization Machines (FwFMs) to model the different feature interactions between different fields in a much more memory-efficient way. Our experimental evaluations show that FwFMs can achieve competitive prediction performance with only as few as 4% parameters of FFMs. When using the same number of parameters, FwFMs can bring 0.92% and 0.47% AUC lift over FFMs on two real CTR prediction data sets.
+
+.. image:: ../../../asset/fwfm.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.0``.
+- ``fields (dict)`` : This parameter defines the mapping from fields to features, key is field's id, value is a list of features in this field. For example, in ml-100k dataset, it can be set as ``{0: ['user_id','age'], 1: ['item_id', 'class']}``. If it is set to ``None``, the features and the fields are corresponding one-to-one. Defaults to ``None``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FwFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- The features defined in ``fields`` must be in the dataset and be loaded by data module in RecBole. It means the value in ``fields`` must appear in ``load_col``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/context/lr.rst b/docs/source/user_guide/model/context/lr.rst
new file mode 100644
index 000000000..77b8e615a
--- /dev/null
+++ b/docs/source/user_guide/model/context/lr.rst
@@ -0,0 +1,67 @@
+LR
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/1242572.1242643>`_
+
+**Title:** Predicting Clicks Estimating the Click-Through Rate for New Ads
+
+**Authors:** Matthew Richardson, Ewa Dominowska, Robert Ragno
+
+**Abstract:**  Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the search engine receives from the ads. Further, showing the user an ad that they prefer to click on improves user satisfaction. For these reasons, it is important to be able to accurately estimate the click-through rate of ads in the system. For ads that have been displayed repeatedly, this is empirically measurable, but for new ads, other means must be used. We show that we can use features of ads, terms, and advertisers to learn a model that accurately predicts the click-though rate for new ads. We also show that using our model improves the convergence and performance of an advertising system. As a result, our model increases both revenue and user satisfaction.
+
+.. image:: ../../../asset/lr.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='LR', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/nfm.rst b/docs/source/user_guide/model/context/nfm.rst
new file mode 100644
index 000000000..0c4e8472c
--- /dev/null
+++ b/docs/source/user_guide/model/context/nfm.rst
@@ -0,0 +1,75 @@
+NFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3077136.3080777>`_
+
+**Title:** Neural Factorization Machines for Sparse Predictive Analytics
+
+**Authors:** Xiangnan He, Tat-Seng Chua
+
+**Abstract:**  Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features.
+
+*Factorization Machines* (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the *Wide&Deep* by Google and *DeepCross* by Microsoft, the deep structure meanwhile makes them difficult to train.
+
+In this paper, we propose a novel model *Neural Factorization Machine* (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.
+
+.. image:: ../../../asset/nfm.jpg
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[64, 64, 64]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.0``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code::
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[10,10]','[20,20]','[30,30]','[40,40]','[50,50]',[20,20,20]','[30,30,30]','[40,40,40]','[50,50,50]']
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/pnn.rst b/docs/source/user_guide/model/context/pnn.rst
new file mode 100644
index 000000000..75fe48d13
--- /dev/null
+++ b/docs/source/user_guide/model/context/pnn.rst
@@ -0,0 +1,75 @@
+PNN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://ieeexplore.ieee.org/abstract/document/7837964/>`_
+
+**Title:** Product-based neural networks for user response prediction
+
+**Authors:** Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, Jun Wang
+
+**Abstract:**  Predicting user responses, such as clicks and conversions, is of great importance and has found its usage inmany Web applications including recommender systems, webs earch and online advertising. The data in those applications is mostly categorical and contains multiple fields, a typical representation is to transform it into a high-dimensional sparse binary feature representation via one-hot encoding. Facing with the extreme sparsity, traditional models may limit their capacity of mining shallow patterns from the data, i.e. low-order feature combinations. Deep models like deep neural networks, on the other hand, cannot be directly applied for the high-dimensional input because of the huge feature space. In this paper, we propose a Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfieldcategories, and further fully connected layers to explore high-order feature interactions. Our experimental results on two-large-scale real-world ad click datasets demonstrate that PNNs consistently outperform the state-of-the-art models on various metrics.
+
+.. image:: ../../../asset/pnn.jpg
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[128, 256, 128]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.0``.
+- ``use_inner (bool)`` :  Whether to use the inner product in the model. Defaults to ``True``.
+- ``use_outer (bool)`` : Whether to use the outer product in the model. Defaults to ``False``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``0.0``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='PNN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]']
+   reg_weight choice [0.0]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/context/widedeep.rst b/docs/source/user_guide/model/context/widedeep.rst
new file mode 100644
index 000000000..8428a281e
--- /dev/null
+++ b/docs/source/user_guide/model/context/widedeep.rst
@@ -0,0 +1,73 @@
+WideDeep
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2988450.2988454>`_
+
+**Title:** Wide & Deep Learning for Recommender Systems
+
+**Authors:** Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah
+
+**Abstract:**  Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
+
+.. image:: ../../../asset/widedeep.png
+    :width: 700
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[32, 16, 8]``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.1``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='WideDeep', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/context/xdeepfm.rst b/docs/source/user_guide/model/context/xdeepfm.rst
new file mode 100644
index 000000000..8efbaef0a
--- /dev/null
+++ b/docs/source/user_guide/model/context/xdeepfm.rst
@@ -0,0 +1,102 @@
+xDeepFM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3219819.3220023>`_
+
+**Title:** xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
+
+**Authors:** Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang,
+Zhongxia Chen, Xing Xie, Guangzhong Sun
+
+**Abstract:** Combinatorial features are essential for the success of many commercial
+models. Manually crafting these features usually comes
+with high cost due to the variety, volume and velocity of raw data
+in web-scale systems. Factorization based models, which measure
+interactions in terms of vector product, can learn patterns of combinatorial
+features automatically and generalize to unseen features
+as well. With the great success of deep neural networks (DNNs)
+in various fields, recently researchers have proposed several DNNbased
+factorization model to learn both low- and high-order feature
+interactions. Despite the powerful ability of learning an arbitrary
+function from data, plain DNNs generate feature interactions implicitly
+and at the bit-wise level. In this paper, we propose a novel
+Compressed Interaction Network (CIN), which aims to generate
+feature interactions in an explicit fashion and at the vector-wise
+level. We show that the CIN share some functionalities with convolutional
+neural networks (CNNs) and recurrent neural networks
+(RNNs). We further combine a CIN and a classical DNN into one
+unified model, and named this new model eXtreme Deep Factorization
+Machine (xDeepFM). On one hand, the xDeepFM is able
+to learn certain bounded-degree feature interactions explicitly; on
+the other hand, it can learn arbitrary low- and high-order feature
+interactions implicitly. We conduct comprehensive experiments on
+three real-world datasets. Our results demonstrate that xDeepFM
+outperforms state-of-the-art models.
+
+.. image:: ../../../asset/xdeepfm.png
+    :width: 500
+    :align: center
+
+Quick Start with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of features. Defaults to ``10``.
+- ``mlp_hidden_size (list of int)`` : The hidden size of MLP layers. Defaults to ``[128,128,128]``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``5e-4``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.2``.
+- ``direct (bool)`` : Whether the output of the current layer will be output directly or not. When it is set to ``False``, the output of the current layer will be equally devided into two parts, one part will be the input of the next hidden layer, and the other part will be output directly. Defaults to ``False``.
+- ``cin_layer_size (list of int)`` : The size of CIN layers. Defaults to ``[100,100,100]``
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='xDeepFM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']
+   cin_layer_size choice ['[60,60,60]','[80,80,80]','[100,100,100]','[120,120,120]']
+   reg_weight choice [1e-7,1e-5,5e-4,1e-3]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/context/xgboost.rst b/docs/source/user_guide/model/context/xgboost.rst
new file mode 100644
index 000000000..c290aafbb
--- /dev/null
+++ b/docs/source/user_guide/model/context/xgboost.rst
@@ -0,0 +1,51 @@
+XGBOOST(External algorithm library)
+=====================================
+
+Introduction
+---------------------
+
+`[XGBoost] <https://xgboost.readthedocs.io/en/latest/>`_
+
+**XGBoost** is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``convert_token_to_onehot (bool)`` : If True, the token type features will be converted into onehot form. Defaults to ``False``.
+- ``token_num_threhold (int)`` : The threshold of doing onehot conversion.
+
+- ``xgb_silent (bool, optional)`` : Whether print messages during construction.
+- ``xgb_nthread (int, optional)`` : Number of threads to use for loading data when parallelization is applicable. If -1, uses maximum threads available on the system.
+- ``xgb_model (file name of stored xgb model or 'Booster' instance)`` :Xgb model to be loaded before training.
+- ``xgb_params (dict)`` : Booster params.
+- ``xgb_num_boost_round (int)`` : Number of boosting iterations.
+- ``xgb_early_stopping_rounds (int)`` : Activates early stopping.
+- ``xgb_verbose_eval (bool or int)`` : If verbose_eval is True then the evaluation metric on the validation set is printed at each boosting stage. If verbose_eval is an integer then the evaluation metric on the validation set is printed at every given verbose_eval boosting stage.
+
+Please refer to [XGBoost Python package](https://xgboost.readthedocs.io/en/latest/python/python_api.html) for more details.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='xgboost', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+ 
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/bpr.rst b/docs/source/user_guide/model/general/bpr.rst
new file mode 100644
index 000000000..ee9c6492d
--- /dev/null
+++ b/docs/source/user_guide/model/general/bpr.rst
@@ -0,0 +1,80 @@
+BPR
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.5555/1795114.1795167>`_
+
+**Title:** BPR Bayesian Personalized Ranking from Implicit Feedback
+
+**Authors:** Steffen Rendle, Christoph Freudenthaler, Zeno Gantner and Lars Schmidt-Thieme
+
+**Abstract:** Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products).
+In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases).
+There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or
+adaptive knearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized
+ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion
+BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem.
+We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based
+on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art
+recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized
+ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the
+importance of optimizing models for the right criterion.
+
+.. image:: ../../../asset/bpr.png
+    :width: 500
+    :align: center
+
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='BPR', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/cdae.rst b/docs/source/user_guide/model/general/cdae.rst
new file mode 100644
index 000000000..ca0e8716f
--- /dev/null
+++ b/docs/source/user_guide/model/general/cdae.rst
@@ -0,0 +1,74 @@
+CDAE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2835776.2835837>`_
+
+**Title:** Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
+
+**Authors:** Yao Wu, Christopher DuBois, Alice X. Zheng, Martin Ester
+
+**Abstract:** Most real-world recommender services measure their performance based on the top-N results shown to the end users. Thus, advances in top-N recommendation have far-ranging consequences in practical applications. In this paper, we present a novel method, called Collaborative Denoising Auto-Encoder (CDAE), for top-N recommendation that utilizes the idea of Denoising Auto-Encoders. We demonstrate that the proposed model is a generalization of several well-known collaborative filtering models but with more flexible components. Thorough experiments are conducted to understand the performance of CDAE under various component settings. Furthermore, experimental results on several public datasets demonstrate that CDAE consistently outperforms state-of-the-art top-N recommendation methods on a variety of common evaluation metrics.
+
+.. image:: ../../../asset/cdae.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``loss_type (str)`` : The loss function of model, now we support ``[BCE, MSE]``. Defaults to ``BCE``.
+- ``hid_activation (str)`` : The hidden layer activation function, now we support ``[sigmoid, relu, tanh]`` Defaults to ``relu``.
+- ``out_activation (str)`` : The output layer activation function, now we support ``[sigmoid, relu]``. Defaults to ``sigmoid``.
+- ``corruption_ratio (float)`` : The corruption ratio of the input. Defaults to ``0.5``.
+- ``embedding_size (int)`` : The embedding size of user. Defaults to ``64``.
+- ``reg_weight_1 (float)`` : L1-regularization weight. Defaults to ``0.``.
+- ``reg_weight_2 (float)`` : L2-regularization weight. Defaults to ``0.01``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='CDAE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Note**: Because this model is a non-sampling model, so you must set ``training_neg_sample=0`` when you run this model. 
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/convncf.rst b/docs/source/user_guide/model/general/convncf.rst
new file mode 100644
index 000000000..36cd1303b
--- /dev/null
+++ b/docs/source/user_guide/model/general/convncf.rst
@@ -0,0 +1,79 @@
+ConvNCF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.ijcai.org/Proceedings/2018/308>`_
+
+**Title:** Outer Product-based Neural Collaborative Filtering
+
+**Authors:** Xiangnan He, Xiaoyu Du, Xiang Wang, Feng Tian, Jinhui Tang and Tat-Seng Chua
+
+**Abstract:** In this work, we contribute a new multi-layer neural network architecture named ONCF to perform collaborative filtering. The idea is to use an outer product to explicitly model the pairwise correlations between the dimensions of the embedding space. In contrast to existing neural recommender models that combine user embedding and item embedding via a simple concatenation or element-wise product, our proposal of using outer product above the embedding layer results in a two-dimensional interaction map that is more expressive and semantically plausible.
+Above the interaction map obtained by outer product, we propose to employ a convolutional neural network to learn high-order correlations among embedding dimensions. Extensive experiments on two public implicit feedback data demonstrate the effectiveness of our proposed ONCF framework, in particular, the positive effect of using outer product to model the correlations between embedding dimensions in the low level of multi-layer neural recommender model.
+
+.. image:: ../../../asset/convncf.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``cnn_channels (list)`` : The number of channels in each convolutional neural network layer. Defaults to ``[1, 32, 32, 32, 32]``.
+- ``cnn_kernels (list)`` : The size of convolutional kernel in each convolutional neural network layer. Defaults to ``[4, 4, 2, 2]``.
+- ``cnn_strides (list)`` : The strides of convolution in each convolutional neural network layer. Defaults to ``[4, 4, 2, 2]``.
+- ``dropout_prob (float)`` : The dropout rate in the linear predict layer. Defaults to ``0.2``.
+- ``reg_weights (list)`` : The L2 regularization weights. Defaults to ``[0.1, 0.1]``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='ConvNCF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   cnn_channels choice ['[1,128,128,64,32]','[1,32,32,32,32,32,32]','[1,64,32,32,32,32]','[1,64,32,32,32]']
+   cnn_kernels choice ['[4,4,2,2]','[2,2,2,2,2,2]','[4,2,2,2,2]','[8,4,2]']
+   cnn_strides choice ['[4,4,2,2]','[2,2,2,2,2,2]','[4,2,2,2,2]','[8,4,2]']
+   reg_weights choice ['[0.1,0.1]','[0.2,0.2]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/dgcf.rst b/docs/source/user_guide/model/general/dgcf.rst
new file mode 100644
index 000000000..01f30c0c1
--- /dev/null
+++ b/docs/source/user_guide/model/general/dgcf.rst
@@ -0,0 +1,110 @@
+DGCF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3397271.3401137>`_
+
+**Title:** Disentangled Graph Collaborative Filtering
+
+**Authors:** Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, Tat-Seng Chua
+
+**Abstract:** Learning informative representations of users and items from the
+interaction data is of crucial importance to collaborative filtering
+(CF). Present embedding functions exploit user-item relationships
+to enrich the representations, evolving from a single user-item
+instance to the holistic interaction graph. Nevertheless, they largely
+model the relationships in a uniform manner, while neglecting
+the diversity of user intents on adopting the items, which could
+be to pass time, for interest, or shopping for others like families.
+Such uniform approach to model user interests easily results in
+suboptimal representations, failing to model diverse relationships
+and disentangle user intents in representations.
+
+In this work, we pay special attention to user-item relationships
+at the finer granularity of user intents. We hence devise a new
+model, Disentangled Graph Collaborative Filtering (DGCF), to
+disentangle these factors and yield disentangled representations.
+Specifically, by modeling a distribution over intents for each
+user-item interaction, we iteratively refine the intent-aware
+interaction graphs and representations. Meanwhile, we encourage
+independence of different intents. This leads to disentangled
+representations, effectively distilling information pertinent to each
+intent. We conduct extensive experiments on three benchmark
+datasets, and DGCF achieves significant improvements over several
+state-of-the-art models like NGCF, DisenGCN, and
+MacridVAE. Further analyses offer insights into the advantages
+of DGCF on the disentanglement of user intents and interpretability
+of representations.
+
+.. image:: ../../../asset/dgcf.jpg
+    :width: 700
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``n_factors (int)`` : The number of factors for disentanglement. Defaults to ``4``.
+- ``n_iterations (int)`` : The number of iterations for each layer. Defaults to ``2``.
+- ``n_layers (int)`` : The number of reasoning layers. Defaults to ``1``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-03``.
+- ``cor_weight (float)`` : The correlation loss weight. Defaults to ``0.01``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DGCF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- ``embedding_size`` needs to be exactly divisible by ``n_factors``
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   n_factors choice [2,4,8] 
+   reg_weight choice [1e-03] 
+   cor_weight choice [0.005,0.01,0.02,0.05]
+   n_layers choice [1]
+   n_iterations choice [2]
+   delay choice [1e-03] 
+   cor_delay choice [1e-02]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/dmf.rst b/docs/source/user_guide/model/general/dmf.rst
new file mode 100644
index 000000000..ce6490d11
--- /dev/null
+++ b/docs/source/user_guide/model/general/dmf.rst
@@ -0,0 +1,80 @@
+DMF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.ijcai.org/Proceedings/2017/447>`_
+
+**Title:** Deep Matrix Factorization Models for Recommender Systems
+
+**Authors:** Hong-Jian Xue, Xin-Yu Dai, Jianbing Zhang, Shujian Huang, Jiajun Chen
+
+**Abstract:** Recommender systems usually make personalized recommendation with user-item interaction ratings, implicit feedback and auxiliary information. Matrix factorization is the basic idea to predict a personalized ranking over a set of items for an individual user with the similarities among users and items. In this paper, we propose a novel matrix factorization model with neural network architecture. Firstly, we construct a user-item matrix with explicit ratings and non-preference implicit feedback. With this matrix as the input, we present a deep structure learning architecture to learn a common low dimensional space for the representations of users and items. Secondly, we design a new loss function based on binary cross entropy, in which we consider both explicit ratings and implicit feedback for a better optimization. The experimental results show the effectiveness of both our proposed model and the loss function. On several benchmark datasets, our model outperformed other state-of-the-art methods. We also conduct extensive experiments to evaluate the performance within different experimental settings.
+
+.. image:: ../../../asset/dmf.jpg
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``user_embedding_size (int)`` : The initial embedding size of users. Defaults to ``64``.
+- ``item_embedding_size (int)`` : The initial embedding size of items. Defaults to ``64``.
+- ``user_hidden_size_list (list)`` : The hidden size of each layer in MLP for users, the length of list is equal to the number of layers. Defaults to ``[64,64]``.
+- ``item_hidden_size_list (list)`` : The hidden size of each layer in MLP for items, the length of list is equal to the number of layers. Defaults to ``[64,64]``.
+- ``inter_matrix_type (str)`` : Use the implicit interaction matrix or the rating matrix. Defaults to ``'01'``. Range in ``['01', 'rating']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='DMF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- The last value in ``user_hidden_size_list`` and ``item_hidden_size_list`` must be the same.
+
+- If you set ``inter_matrix_type='rating'``, the 'rating' field from \*.inter atomic files must be remained when loading dataset. It means that 'rating' must be appeared in ``load_col``. Besides, if you use 'rating' field to filter the dataset, please set ``drop_filter_field=False``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   user_layers_dim choice ['[64, 64]','[64, 32]','[128,64']] 
+   item_layers_dim choice ['[64, 64]','[64, 32]','[128,64']]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/enmf.rst b/docs/source/user_guide/model/general/enmf.rst
new file mode 100644
index 000000000..6b30a2d05
--- /dev/null
+++ b/docs/source/user_guide/model/general/enmf.rst
@@ -0,0 +1,76 @@
+ENMF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3373807>`_
+
+**Title:** Efficient Neural Matrix Factorization without Sampling for Recommendation
+
+**Authors:** Chen, Chong and Zhang, Min and Wang, Chenyang and Ma, Weizhi and Li, Minming and Liu, Yiqun and Ma, Shaoping
+
+**Abstract:** Recommendation systems play a vital role to keep users engaged with personalized contents in modern online platforms. Recently, deep learning has revolutionized many research fields and there is a surge of interest in applying it for recommendation. However, existing studies have largely focused on exploring complex deep-learning architectures for recommendation task, while typically applying the negative sampling strategy for model learning. Despite effectiveness, we argue that these methods suffer from two important limitations: (1) the methods with complex network structures have a substantial number of parameters, and require expensive computations even with a sampling-based learning strategy; (2) the negative sampling strategy is not robust, making sampling-based methods difficult to achieve the optimal performance in practical applications.
+
+In this work, we propose to learn neural recommendation models from the whole training data without sampling. However, such a non-sampling strategy poses strong challenges to learning efficiency. To address this, we derive three new optimization methods through rigorous mathematical reasoning, which can efficiently learn model parameters from the whole data (including all missing data) with a rather low time complexity. Moreover, based on a simple Neural Matrix Factorization architecture, we present a general framework named ENMF, short for *Efficient Neural Matrix Factorization*. Extensive experiments on three real-world public datasets indicate that the proposed ENMF framework consistently and significantly outperforms the state-of-the-art methods on the Top-K recommendation task. Remarkably, ENMF also shows significant advantages in training efficiency, which makes it more applicable to real-world large-scale systems.
+
+.. image:: ../../../asset/enmf.jpg
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``dropout_prob (float)`` : The dropout ratio of the embedding. Defaults to ``0.7``.
+- ``embedding_size (int)`` : The embedding size of user. Defaults to ``64``.
+- ``reg_weight (float)`` : L2-regularization weight. Defaults to ``0.``.
+- ``negative_weight (float)`` : The weight of non-observed data. Defaults to ``0.5``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='ENMF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Note**: Because this model is a non-sampling model, so you must set ``training_neg_sample=0`` when you run this model. 
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+
+   negative_weight choice [0.001,0.005,0.01,0.02,0.05,0.1,0.2,0.5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/fism.rst b/docs/source/user_guide/model/general/fism.rst
new file mode 100644
index 000000000..c93eaf00d
--- /dev/null
+++ b/docs/source/user_guide/model/general/fism.rst
@@ -0,0 +1,82 @@
+FISM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2487575.2487589>`_
+
+**Title:** FISM: Factored Item Similarity Models for Top-N Recommender Systems
+
+**Authors:** Santosh Kabbur, Xia Ning, George Karypis
+
+**Abstract:** The effectiveness of existing top-N recommendation methods decreases as
+the sparsity of the datasets increases. To alleviate this problem, we present an
+item-based method for generating top-N recommendations that learns the itemitem
+similarity matrix as the product of two low dimensional latent factor matrices.
+These matrices are learned using a structural equation modeling approach, wherein the
+value being estimated is not used for its own estimation. A comprehensive set of
+experiments on multiple datasets at three different sparsity levels indicate that
+the proposed methods can handle sparse datasets effectively and outperforms other
+state-of-the-art top-N recommendation methods. The experimental results also show
+that the relative performance gains compared to competing methods increase as the
+data gets sparser.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``split_to (int)`` : This is a parameter used to reduce the GPU memory usage during the evaluation. The larger the value, the less the memory usage and the slower the evaluation speed. Defaults to ``0``.
+- ``alpha (float)`` : It is a hyper-parameter controlling the normalization effect of the number of user history interactions when calculating the similarity. Defaults to ``0``.
+- ``reg_weights (list)`` : The L2 regularization weights. Defaults to ``[1e-2, 1e-2]``.
+
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FISM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   reg_weights choice ['[1e-7, 1e-7]','[0, 0]'] 
+   alpha choice [0]
+   weight_size choice [64]
+   beta choice [0.5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/gcmc.rst b/docs/source/user_guide/model/general/gcmc.rst
new file mode 100644
index 000000000..efb985579
--- /dev/null
+++ b/docs/source/user_guide/model/general/gcmc.rst
@@ -0,0 +1,89 @@
+GCMC
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://arxiv.org/abs/1706.02263>`_
+
+**Title:** Graph Convolutional Matrix Completion
+
+**Authors:** Rianne van den Berg, Thomas N. Kipf, Max Welling
+
+**Abstract:**  We consider matrix completion for recommender systems from the point of view of
+link prediction on graphs. Interaction data
+such as movie ratings can be represented by a
+bipartite user-item graph with labeled edges
+denoting observed ratings. Building on recent
+progress in deep learning on graph-structured
+data, we propose a graph auto-encoder framework based on differentiable message passing
+on the bipartite interaction graph. Our model
+shows competitive performance on standard
+collaborative filtering benchmarks. In settings
+where complimentary feature information or
+structured data such as a social network is
+available, our framework outperforms recent
+state-of-the-art methods.
+
+.. image:: ../../../asset/gcmc.png
+    :width: 700
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``accum (str)`` : The accumulation function in the GCN layers. Defaults to ``'stack'``. Range in ``['sum', 'stack']``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.3``.
+- ``gcn_output_dim (int)`` : The output dimension of GCN layer in GCN encoder. Defaults to ``500``.
+- ``embedding_size (int)`` : The embedding size of user and item. Defaults to ``64``.
+- ``sparse_feature (bool)`` : Whether to use sparse tensor to represent the features. Defaults to ``True``.
+- ``class_num (int)`` : Number of rating types. Defaults to ``2``.
+- ``num_basis_functions (int)`` : Number of basis functions for BiDecoder. Defaults to ``2``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='GCMC', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   accum choice ['stack','sum'] 
+   gcn_output_dim choice [500,256,1024] 
+   num_basis_functions choice ['2']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/general/itemknn.rst b/docs/source/user_guide/model/general/itemknn.rst
new file mode 100644
index 000000000..1f63e6228
--- /dev/null
+++ b/docs/source/user_guide/model/general/itemknn.rst
@@ -0,0 +1,81 @@
+ItemKNN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/963770.963776>`_
+
+**Title:** Item-based top-N recommendation algorithms
+
+**Authors:** Mukund Deshpande and George Karypis
+
+**Abstract:** The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems—a personalized information filtering technology used to identify
+a set of items that will be of interest to a certain user. User-based collaborative filtering is the most
+successful technology for building recommender systems to date and is extensively used in many
+commercial recommender systems. Unfortunately, the computational complexity of these methods
+grows linearly with the number of customers, which in typical commercial applications can be several millions. To address these scalability concerns model-based recommendation techniques have
+been developed. These techniques analyze the user–item matrix to discover relations between the
+different items and use these relations to compute the list of recommendations.
+
+In this article, we present one such class of model-based recommendation algorithms that first
+determines the similarities between the various items and then uses them to identify the set of
+items to be recommended. The key steps in this class of algorithms are (i) the method used to
+compute the similarity between the items, and (ii) the method used to combine these similarities
+in order to compute the similarity between a basket of items and a candidate recommender item.
+Our experimental evaluation on eight real datasets shows that these item-based algorithms are
+up to two orders of magnitude faster than the traditional user-neighborhood based recommender
+systems and provide recommendations with comparable or better quality.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``k (int)`` : The neighborhood size. Defaults to ``100``.
+
+- ``shrink (float)`` : A normalization hyper parameter in calculate cosine distance. Defaults to ``0.0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='ItemKNN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   k choice [10,50,100,200,250,300,400,500,1000,1500,2000,2500] 
+   shrink choice [0.0,1.0]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/lightgcn.rst b/docs/source/user_guide/model/general/lightgcn.rst
new file mode 100644
index 000000000..01127d929
--- /dev/null
+++ b/docs/source/user_guide/model/general/lightgcn.rst
@@ -0,0 +1,102 @@
+LightGCN
+============
+
+Introduction
+------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3397271.3401063>`_
+
+**Title:** LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
+
+**Authors:** Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang
+
+**Abstract:**
+Graph Convolution Network (GCN) has become new state-ofthe-art for collaborative filtering. Nevertheless, the reasons of
+its effectiveness for recommendation are not well understood.
+Existing work that adapts GCN to recommendation lacks thorough
+ablation analyses on GCN, which is originally designed for graph
+classification tasks and equipped with many neural network
+operations. However, we empirically find that the two most
+common designs in GCNs — feature transformation and nonlinear
+activation — contribute little to the performance of collaborative
+filtering. Even worse, including them adds to the difficulty of
+training and degrades recommendation performance.
+
+In this work, we aim to simplify the design of GCN to
+make it more concise and appropriate for recommendation. We
+propose a new model named LightGCN, including only the most
+essential component in GCN — neighborhood aggregation — for
+collaborative filtering. Specifically, LightGCN learns user and
+item embeddings by linearly propagating them on the user-item
+interaction graph, and uses the weighted sum of the embeddings
+learned at all layers as the final embedding. Such simple, linear,
+and neat model is much easier to implement and train, exhibiting
+substantial improvements (about 16.0% relative improvement on
+average) over Neural Graph Collaborative Filtering (NGCF) — a
+state-of-the-art GCN-based recommender model — under exactly
+the same experimental setting. Further analyses are provided
+towards the rationality of the simple LightGCN from both analytical
+and empirical perspectives.
+
+
+.. image:: ../../../asset/lightgcn.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : the embedding size of users and items. Defaults to ``64``.
+- ``n_layers (int)`` : The number of layers in lightGCN. Defaults to ``2``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-05``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='LightGCN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   n_layers choice [1,2,3,4]
+   reg_weight choice [1e-05,1e-04,1e-03,1e-02]
+   
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/general/line.rst b/docs/source/user_guide/model/general/line.rst
new file mode 100644
index 000000000..c154154b2
--- /dev/null
+++ b/docs/source/user_guide/model/general/line.rst
@@ -0,0 +1,72 @@
+LINE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2736277.2741093>`_
+
+**Title:** LINE: Large-scale Information Network Embedding
+
+**Authors:** Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
+
+**Abstract:** This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE``, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available.
+
+.. image:: ../../../asset/line.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``order (int)`` : The order of proximity of the model. Defaults to ``2``.
+- ``second_order_loss_weight (float)`` : The super parameter of the loss of second proximity loss. Defaults to ``1``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='LINE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   training_neg_sample_num choice [1,3,5]
+   second_order_loss_weight choice [0.3,0.6,1]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/macridvae.rst b/docs/source/user_guide/model/general/macridvae.rst
new file mode 100644
index 000000000..cf2cbd72f
--- /dev/null
+++ b/docs/source/user_guide/model/general/macridvae.rst
@@ -0,0 +1,99 @@
+MacridVAE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://jianxinma.github.io/assets/disentangle-recsys.pdf>`_
+
+**Title:** Learning Disentangled Representations for Recommendation
+
+**Authors:** Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, Wenwu Zhu
+
+**Abstract:** User behavior data in recommender systems are driven by the complex interactions
+of many latent factors behind the users’ decision making processes. The factors are
+highly entangled, and may range from high-level ones that govern user intentions,
+to low-level ones that characterize a user’s preference when executing an intention.
+Learning representations that uncover and disentangle these latent factors can bring
+enhanced robustness, interpretability, and controllability. However, learning such
+disentangled representations from user behavior is challenging, and remains largely
+neglected by the existing literature. In this paper, we present the MACRo-mIcro
+Disentangled Variational Auto-Encoder (MacridVAE) for learning disentangled
+representations from user behavior. Our approach achieves macro disentanglement
+by inferring the high-level concepts associated with user intentions (e.g., to buy
+a shirt or a cellphone), while capturing the preference of a user regarding the
+different concepts separately. A micro-disentanglement regularizer, stemming
+from an information-theoretic interpretation of VAEs, then forces each dimension
+of the representations to independently reflect an isolated low-level factor (e.g.,
+the size or the color of a shirt). Empirical results show that our approach can
+achieve substantial improvement over the state-of-the-art baselines. We further
+demonstrate that the learned representations are interpretable and controllable,
+which can potentially lead to a new paradigm for recommendation where users a
+
+.. image:: ../../../asset/macridvae.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The latent dimension of auto-encoder. Defaults to ``128``.
+- ``dropout_prob (float)`` : The drop out probability of input. Defaults to ``0.5``.
+- ``kfac (int)`` : Number of facets (macro concepts). ``10``.
+- ``nogb (boolean)`` : Disable Gumbel-Softmax sampling. ``False``.
+- ``std (float)`` : Standard deviation of the Gaussian prior. ``False``.
+- ``encoder_hidden_size (list)`` : The MLP hidden layer. Defaults to ``[600]``.
+- ``tau (float)`` : Temperature of sigmoid/softmax, in (0,oo). ``False``.
+- ``anneal_cap (float)`` : The super parameter of the weight of KL loss. Defaults to ``0.2``.
+- ``total_anneal_steps (int)`` : The maximum steps of anneal update. Defaults to ``200000``.
+- ``reg_weights (list)`` : L2 regularization. Defaults to ``[0.0,0.0]``.
+- ``training_neg_sample (int)`` : The negative sample num for training. Defaults to ``0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='MacridVAE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Note**: Because this model is a non-sampling model, so you must set ``training_neg_sample=0`` when you run this model. 
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   kafc choice [3,5,10,20]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/multidae.rst b/docs/source/user_guide/model/general/multidae.rst
new file mode 100644
index 000000000..7ba8cbbf9
--- /dev/null
+++ b/docs/source/user_guide/model/general/multidae.rst
@@ -0,0 +1,73 @@
+MultiDAE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3178876.3186150>`_
+
+**Title:** Variational Autoencoders for Collaborative Filtering
+
+**Authors:** Dawen  Liang, Rahul G, Matthew D Hoffman, Tony Jebara
+
+**Abstract:** We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
+
+.. image:: ../../../asset/multidae.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``latent_dimendion (int)`` : The latent dimension of auto-encoder. Defaults to ``64``.
+- ``mlp_hidden_size (list)`` : The MLP hidden layer. Defaults to ``[600]``.
+- ``dropout_prob (float)`` : The drop out probability of input. Defaults to ``0.5``.
+- ``training_neg_sample (int)`` : The negative sample num for training. Defaults to ``0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='MultiDAE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Note**: Because this model is a non-sampling model, so you must set ``training_neg_sample=0`` when you run this model. 
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/multivae.rst b/docs/source/user_guide/model/general/multivae.rst
new file mode 100644
index 000000000..620e02bd4
--- /dev/null
+++ b/docs/source/user_guide/model/general/multivae.rst
@@ -0,0 +1,75 @@
+MultiVAE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3178876.3186150>`_
+
+**Title:** Variational Autoencoders for Collaborative Filtering
+
+**Authors:** Dawen  Liang, Rahul G, Matthew D Hoffman, Tony Jebara
+
+**Abstract:** We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
+
+.. image:: ../../../asset/multivae.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``latent_dimendion (int)`` : The latent dimension of auto-encoder. Defaults to ``128``.
+- ``mlp_hidden_size (list)`` : The MLP hidden layer. Defaults to ``[600]``.
+- ``dropout_prob (float)`` : The drop out probability of input. Defaults to ``0.5``.
+- ``anneal_cap (float)`` : The super parameter of the weight of KL loss. Defaults to ``0.2``.
+- ``total_anneal_steps (int)`` : The maximum steps of anneal update. Defaults to ``200000``.
+- ``training_neg_sample (int)`` : The negative sample num for training. Defaults to ``0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='MultiVAE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Note**: Because this model is a non-sampling model, so you must set ``training_neg_sample=0`` when you run this model. 
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/nais.rst b/docs/source/user_guide/model/general/nais.rst
new file mode 100644
index 000000000..2e0f60ea7
--- /dev/null
+++ b/docs/source/user_guide/model/general/nais.rst
@@ -0,0 +1,95 @@
+NAIS
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://doi.ieeecomputersociety.org/10.1109/TKDE.2018.2831682>`_
+
+**Title:** NAIS: Neural Attentive Item Similarity Model for Recommendation
+
+**Authors:** Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, and Tat-Seng Chua
+
+**Abstract:** Item-to-item collaborative filtering (aka.item-based CF) has been long used for building
+recommender systems in industrial settings, owing to its interpretability and efficiency in real-time
+personalization. It builds a user’s profile as her historically interacted items, recommending new items
+that are similar to the user’s profile. As such, the key to an item-based CF method is in the estimation
+of item similarities. Early approaches use statistical measures such as cosine similarity and Pearson
+coefficient to estimate item similarities, which are less accurate since they lack tailored optimization
+for the recommendation task. In recent years, several works attempt to learn item similarities from data,
+by expressing the similarity as an underlying model and estimating model parameters by optimizing a
+recommendation-aware objective function. While extensive efforts have been made to use shallow linear
+models for learning item similarities, there has been relatively less work exploring nonlinear neural
+network models for item-based CF. In this work, we propose a neural network model named Neural Attentive
+Item Similaritymodel(NAIS) for item-based CF. The key to our design of NAIS is an attention network,
+which is capable of distinguishing which historical items in a user profile are more important for a prediction.
+Compared to the state-of-the-art item-based CF method FactoredItem SimilarityModel(FISM), our NAIS has
+stronger representation power with only a few additional parameters brought by the attention network.
+Extensive experiments on two public benchmarks demonstrate the effectiveness of NAIS. This work is the first
+attempt that designs neural network models for item-based CF, opening up new research possibilities for future
+developments of neural recommender systems.
+
+.. image:: ../../../asset/nais.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``weight_size (int)`` : The vector that projects the hidden layer into an output attention weight. Defaults to ``64``.
+- ``algorithm (str)`` : The attention method. Defaults to ``'prod'``. Range in ``['prod', 'concat']``.
+- ``split_to (int)`` : This is a parameter used to reduce the GPU memory usage during the evaluation. The larger the value, the less the memory usage and the slower the evaluation speed. Defaults to ``0``.
+- ``alpha (float)`` : It is a hyper-parameter controlling the normalization effect of the number of user history interactions when calculating the similarity. Defaults to ``0``.
+- ``beta (float)`` : It is the smoothing exponent controlling the denominator of softmax, it will be set in the range of ``[0, 1]``. Obviously, when beta is set to ``1``,it rcovers the softmax function; when ``beta`` is smaller than ``1``,the value of denominator will be suppressed, as a result, the attention weights will not be overly punished for active users. Defaults to ``0.5``.
+- ``reg_weights (list)`` : The L2 regularization weights. Defaults to ``[1e-7, 1e-7, 1e-5]``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NAIS', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   weight_size choice [64] 
+   reg_weights choice ['[1e-7, 1e-7, 1e-5]','[0,0,0]'] 
+   alpha choice [0] 
+   beta choice [0.5]
+   
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/neumf.rst b/docs/source/user_guide/model/general/neumf.rst
new file mode 100644
index 000000000..3ff4548a2
--- /dev/null
+++ b/docs/source/user_guide/model/general/neumf.rst
@@ -0,0 +1,81 @@
+NeuMF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3038912.3052569>`_
+
+**Title:** Neural Collaborative Filtering
+
+**Authors:** Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu and Tat-Seng Chua
+
+**Abstract:** In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation --- collaborative filtering --- on the basis of implicit feedback.
+
+Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering --- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.
+
+By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network-based Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.
+
+.. image:: ../../../asset/neumf.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``mf_embedding_size (int)`` : The MF embedding size of user and item. Defaults to ``64``.
+- ``mlp_embedding_size (int)`` : The MLP embedding size of user and item. Defaults to ``64``.
+- ``mlp_hidden_size (list)`` : The hidden size of each layer in MLP, the length of list is equal to the number of layers. Defaults to ``[128,64]``.
+- ``dropout_prob (float)`` : The dropout rate in MLP layers. Defaults to ``0.1``.
+- ``mf_train (bool)`` : Whether to train the MF part of the model. Defaults to ``True``.
+- ``mlp_train (bool)`` : Whether to train the MLP part of the model. Defaults to ``True``.
+- ``use_pretrain (bool)`` : Whether to use the pre-trained parameters for MF and MLP part. Defaults to ``False``.
+- ``mf_pretrain_path`` : The path of pre-trained MF part model. If ``use_pretrain`` is set to False, it will be ignored. Defaults to ``None``.
+- ``mlp_pretrain_path`` : The path of pre-trained MLP part model. If ``use_pretrain`` is set to False, it will be ignored. Defaults to ``None``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NeuMF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   mlp_hidden_size choice ['[64,32,16]','[32,16,8]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/ngcf.rst b/docs/source/user_guide/model/general/ngcf.rst
new file mode 100644
index 000000000..b1e12ae01
--- /dev/null
+++ b/docs/source/user_guide/model/general/ngcf.rst
@@ -0,0 +1,79 @@
+NGCF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3331184.3331267>`_
+
+**Title:** Neural Graph Collaborative Filtering
+
+**Authors:** Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng and Tat-Seng Chua
+
+**Abstract:** Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect.
+
+In this work, we propose to integrate the user-item interactions - more specifically the bipartite graph structure - into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec  and Collaborative Memory Network . Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF.
+
+.. image:: ../../../asset/ngcf.jpg
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``hidden_size_list (list)`` : The hidden size of each layer in GCN layers, the length of list is equal to the number of layers. Defaults to ``[64,64,64]``.
+- ``node_dropout (float)`` : The dropout rate of node in each GNN layer. Defaults to ``0.0``.
+- ``message_dropout (float)`` : The dropout rate of edge in each GNN layer. Defaults to ``0.1``.
+- ``reg_weight (list)`` : The L2 regularization weight. Defaults to ``1e-5``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NGCF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   hidden_size_list choice ['[64,64,64]','[128,128,128]','[256,256,256]','[512,512,512]']  
+   node_dropout choice [0.0,0.1,0.2]
+   message_dropout choice [0.0,0.1,0.2,0.3] 
+   reg_weight choice [1e-5,1e-4]
+   delay choice [1e-5,1e-4,1e-4,1e-2,1e-1]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/nncf.rst b/docs/source/user_guide/model/general/nncf.rst
new file mode 100644
index 000000000..824483b45
--- /dev/null
+++ b/docs/source/user_guide/model/general/nncf.rst
@@ -0,0 +1,84 @@
+NNCF
+==========
+
+Introduction
+-------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3132847.3133083>`_
+
+**Title:** A Neural Collaborative Filtering Model with Interaction-based Neighborhood
+
+**Authors:** Ting Bai, Ji-Rong Wen, Jun Zhang, Wayne Xin Zhao
+
+**Abstract:** Recently, deep neural networks have been widely applied to recommender systems. A representative work is to utilize deep learning for modeling complex user-item interactions. However, similar to traditional latent factor models by factorizing user-item interactions, they tend to be ineffective to capture localized information. Localized information, such as neighborhood, is important to recommender systems in complementing the user-item interaction data. Based on this consideration, we propose a novel Neighborhood-based Neural Collaborative Filtering model (NNCF). To the best of our knowledge, it is the first time that the neighborhood information is integrated into the neural collaborative filtering methods. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model for the implicit recommendation task.
+
+.. image:: ../../../asset/nncf.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``ui_embedding_size (int)``: The embedding size of user and item. Defaluts to ``64``.
+- ``neigh_embedding_size (int)``: The embedding size of neighborhood information. Defaults to ``64``.
+- ``num_conv_kernel (int)``: The number of kernels in convolution layer. Defaults to ``128``.
+- ``conv_kernel_size (int)``: The size of kernel in convolution layer. Defaults to ``5``.
+- ``pool_kernel_size (int)``: The size of kernel in pooling layer. Defaults to ``5``.
+- ``mlp_hidden_size (list)``: The hidden size of each layer in MLP, the length of list is equal to the number of layers. Defaults to ``[128,64,32,16]``.
+- ``neigh_num (int)``: The number of neighbors we choose. Defaults to ``20``.
+- ``dropout (float)``: The dropout rate in MLP layers. Defaults to ``0.5``.
+- ``resolution (float)``: The parameter in louvain algorithm, which decides the size of the community. Defaults to ``1.0``.
+- ``use_random (bool)``: Whether to use random method to train neighborhood embedding. Defaults to ``True``.
+- ``use_knn (bool)``: Whether to use knn method to train neighborhood embedding. Defaults to ``False``.
+- ``use_louvain (bool)``: Whether to use louvain method to train neighborhood embedding. Defaults to ``False``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NNCF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+   
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.0005,0.0001,0.00005]
+   neigh_embedding_size choice [64,32]
+   mlp_hidden_size choice ['[128,64,32,16]','[64,32,16,8]']
+   num_conv choice [128,64]
+   
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/general/pop.rst b/docs/source/user_guide/model/general/pop.rst
new file mode 100644
index 000000000..7171fe4a1
--- /dev/null
+++ b/docs/source/user_guide/model/general/pop.rst
@@ -0,0 +1,39 @@
+Pop
+===========
+
+Introduction
+---------------------
+
+This is a model that records the popularity of items in the dataset and recommend the most popular items to users.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- No hyper-parameters
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='Pop', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/general/spectralcf.rst b/docs/source/user_guide/model/general/spectralcf.rst
new file mode 100644
index 000000000..b222c6f14
--- /dev/null
+++ b/docs/source/user_guide/model/general/spectralcf.rst
@@ -0,0 +1,81 @@
+SpectralCF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3240323.3240343>`_
+
+**Title:** Spectral collaborative filtering
+
+**Authors:** Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, Philip S. Yu
+
+**Abstract:**  Despite the popularity of Collaborative Filtering (CF), CF-based methods are haunted by the cold-start problem,
+which has a significantly negative impact on users' experiences with Recommender Systems (RS). In this paper, to overcome the
+aforementioned drawback, we first formulate the relationships between users and items as a bipartite graph. Then, we propose
+a new spectral convolution operation directly performing in the spectral domain, where not only the proximity information of
+a graph but also the connectivity information hidden in the graph are revealed. With the proposed spectral convolution operation,
+we build a deep recommendation model called Spectral Collaborative Filtering (SpectralCF). Benefiting from the rich information
+of connectivity existing in the spectral domain, SpectralCF is capable of discovering deep connections between users and items
+and therefore, alleviates the cold-start problem for CF. To the best of our knowledge, SpectralCF is the first CF-based method
+directly learning from the spectral domains of user-item bipartite graphs. We apply our method on several standard datasets.
+It is shown that SpectralCF significantly out-performs state-of-the-art models.
+
+.. image:: ../../../asset/spectralcf.png
+    :width: 700
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``n_layers (int)`` : The number of layers in SpectralCF. Defaults to ``4``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-3``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='SpectralCF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   reg_weight choice [0.01,0.002,0.001,0.0005]
+   n_layers choice [1,2,3,4]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/knowledge/cfkg.rst b/docs/source/user_guide/model/knowledge/cfkg.rst
new file mode 100644
index 000000000..220897e92
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/cfkg.rst
@@ -0,0 +1,89 @@
+CFKG
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.mdpi.com/1999-4893/11/9/137>`_
+
+**Title:** Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation
+
+**Authors:** Qingyao Ai, Vahid Azizi, Xu Chen and Yongfeng Zhang
+
+**Abstract:** Providing model-generated explanations in recommender systems is important to user
+experience. State-of-the-art recommendation algorithms—especially the collaborative filtering
+(CF)-based approaches with shallow or deep models—usually work with various unstructured
+information sources for recommendation, such as textual reviews, visual images, and various implicit or
+explicit feedbacks. Though structured knowledge bases were considered in content-based approaches,
+they have been largely ignored recently due to the availability of vast amounts of data and the learning
+power of many complex models. However, structured knowledge bases exhibit unique advantages
+in personalized recommendation systems. When the explicit knowledge about users and items is
+considered for recommendation, the system could provide highly customized recommendations based
+on users’ historical behaviors and the knowledge is helpful for providing informed explanations
+regarding the recommended items. A great challenge for using knowledge bases for recommendation is
+how to integrate large-scale structured and unstructured data, while taking advantage of collaborative
+filtering for highly accurate performance. Recent achievements in knowledge-base embedding (KBE)
+sheds light on this problem, which makes it possible to learn user and item representations while
+preserving the structure of their relationship with external knowledge for explanation. In this work,
+we propose to explain knowledge-base embeddings for explainable recommendation. Specifically,
+we propose a knowledge-base representation learning framework to embed heterogeneous entities for
+recommendation, and based on the embedded knowledge base, a soft matching algorithm is proposed
+to generate personalized explanations for the recommended items. Experimental results on real-world
+e-commerce datasets verified the superior recommendation performance and the explainability power
+of our approach compared with state-of-the-art baselines.
+
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users, items, entities and relations. Defaults to ``64``.
+- ``loss_function (str)`` : The optimization loss function. Defaults to ``'inner_product'``. Range in ``['inner_product', 'transe']``.
+- ``margin (float)`` : The margin in margin loss, only be used when ``loss_function`` is set to ``'transe'``. Defaults to ``1.0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='CFKG', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   loss_function choice ['inner_product', 'transe']
+   margin choice [0.5,1.0,2.0]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/knowledge/cke.rst b/docs/source/user_guide/model/knowledge/cke.rst
new file mode 100644
index 000000000..336542f4c
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/cke.rst
@@ -0,0 +1,88 @@
+CKE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2939672.2939673>`_
+
+**Title:** Collaborative Knowledge Base Embedding for Recommender Systems
+
+**Authors:** Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, Wei-Ying Ma
+
+**Abstract:**  Among different recommendation techniques, collaborative filtering usually suffer from limited performance due to the sparsity
+of user-item interactions. To address the issues, auxiliary information is usually used to boost the performance. Due to the rapid
+collection of information on the web, the knowledge base provides
+heterogeneous information including both structured and unstructured data with different semantics, which can be consumed by various applications. In this paper, we investigate how to leverage
+the heterogeneous information in a knowledge base to improve the
+quality of recommender systems. First, by exploiting the knowledge base, we design three components to extract items’ semantic
+representations from structural content, textual content and visual content, respectively. To be specific, we adopt a heterogeneous
+network embedding method, termed as TransR, to extract items’
+structural representations by considering the heterogeneity of both
+nodes and relationships. We apply stacked denoising auto-encoders
+and stacked convolutional auto-encoders, which are two types of
+deep learning based embedding techniques, to extract items’ textual representations and visual representations, respectively. Finally, we propose our final integrated framework, which is termed as
+Collaborative Knowledge Base Embedding (CKE), to jointly learn
+the latent representations in collaborative filtering as well as items’ semantic representations from the knowledge base. To evaluate the performance of each embedding component as well as the
+whole system, we conduct extensive experiments with two realworld datasets from different scenarios. The results reveal that our
+approaches outperform several widely adopted state-of-the-art recommendation methods.
+
+.. image:: ../../../asset/cke.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users, items and entities. Defaults to ``64``.
+- ``kg_embedding_size (int)`` : The embedding size of relations in knowledge graph. Defaults to ``64``.
+- ``reg_weights (list of float)`` : The L2 regularization weights, there are two values,
+  the former is for user and item embedding regularization and the latter is for entity and relation embedding regularization. Defaults to ``[1e-02,1e-02]``
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='CKE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   kg_embedding_size choice [16,32,64,128]
+   reg_weights choice ['[0.1,0.1]','[0.01,0.01]','[0.001,0.001]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/knowledge/kgat.rst b/docs/source/user_guide/model/knowledge/kgat.rst
new file mode 100644
index 000000000..84c6aac65
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/kgat.rst
@@ -0,0 +1,109 @@
+KGAT
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3292500.3330989>`_
+
+**Title:** KGAT: Knowledge Graph Attention Network for Recommendation
+
+**Authors:** Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, Tat-Seng Chua
+
+**Abstract:** To provide more accurate, diverse, and explainable recommendation,
+it is compulsory to go beyond modeling user-item interactions
+and take side information into account. Traditional methods like
+factorization machine (FM) cast it as a supervised learning problem,
+which assumes each interaction as an independent instance with
+side information encoded. Due to the overlook of the relations
+among instances or items (e.g., the director of a movie is also an
+actor of another movie), these methods are insufficient to distill the
+collaborative signal from the collective behaviors of users.
+
+In this work, we investigate the utility of knowledge graph
+(KG), which breaks down the independent interaction assumption
+by linking items with their attributes. We argue that in such a
+hybrid structure of KG and user-item graph, high-order relations
+— which connect two items with one or multiple linked attributes
+— are an essential factor for successful recommendation. We
+propose a new method named Knowledge Graph Attention Network
+(KGAT) which explicitly models the high-order connectivities
+in KG in an end-to-end fashion. It recursively propagates the
+embeddings from a node’s neighbors (which can be users, items,
+or attributes) to refine the node’s embedding, and employs
+an attention mechanism to discriminate the importance of the
+neighbors. Our KGAT is conceptually advantageous to existing
+KG-based recommendation methods, which either exploit highorder relations by extracting paths or implicitly modeling them
+with regularization. Empirical results on three public benchmarks
+show that KGAT significantly outperforms state-of-the-art methods
+like Neural FM and RippleNet. Further studies verify
+the efficacy of embedding propagation for high-order relation
+modeling and the interpretability benefits brought by the attention
+mechanism.
+
+.. image:: ../../../asset/kgat.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users, items and entities. Defaults to ``64``.
+- ``kg_embedding_size (int)`` : The embedding size of relations in knowledge graph. Defaults to ``64``.
+- ``layers (list of int)`` : The hidden size in GNN layers, the length of this list is equal to the number of layers in GNN structure. Defaults to ``[64]``.
+- ``mess_dropout (float)`` : The message dropout rate in GNN layer. Defaults to ``0.1``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-05``.
+- ``aggregator_type (str)`` : The aggregator type used in GNN layer. Defaults to ``'bi'``. Range in ``['gcn', 'graphsage', 'bi']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='KGAT', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- If you want to run KGAT in RecBole, please ensure the torch version is 1.6.0 or later. Because we use torch.sparse.softmax in KGAT, which is only available in torch 1.6.0 or later.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   layers choice ['[64,32,16]','[64,64,64]','[128,64,32]']
+   reg_weight choice [1e-4,5e-5,1e-5,5e-6,1e-6]
+   mess_dropout choice [0.1,0.2,0.3,0.4,0.5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/knowledge/kgcn.rst b/docs/source/user_guide/model/knowledge/kgcn.rst
new file mode 100644
index 000000000..79a9fdc15
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/kgcn.rst
@@ -0,0 +1,86 @@
+KGCN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3308558.3313417>`_
+
+**Title:** Knowledge Graph Convolutional Networks for Recommender
+
+**Authors:** Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, Minyi Guo
+
+**Abstract:**  To alleviate sparsity and cold start problem of collaborative filtering
+based recommender systems, researchers and engineers usually
+collect attributes of users and items, and design delicate algorithms
+to exploit these additional information. In general, the attributes are
+not isolated but connected with each other, which forms a knowledge graph (KG). In this paper, we propose Knowledge Graph
+Convolutional Networks (KGCN), an end-to-end framework that
+captures inter-item relatedness effectively by mining their associated attributes on the KG. To automatically discover both high-order
+structure information and semantic information of the KG, we sample from the neighbors for each entity in the KG as their receptive
+field, then combine neighborhood information with bias when calculating the representation of a given entity. The receptive field can
+be extended to multiple hops away to model high-order proximity
+information and capture users’ potential long-distance interests.
+Moreover, we implement the proposed KGCN in a minibatch fashion, which enables our model to operate on large datasets and KGs.
+We apply the proposed model to three datasets about movie, book,
+and music recommendation, and experiment results demonstrate
+that our approach outperforms strong recommender baselines.
+
+.. image:: ../../../asset/kgcn.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users, relations and entities. Defaults to ``64``.
+- ``aggregator (str)`` : The aggregator used in GNN layers. Defaults to ``'sum'``. Range in ``['sum', 'neighbor', 'concat']``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-7``.
+- ``neighbor_sample_size (int)`` : The number of neighbors to be sampled. Defaults to ``4``.
+- ``n_iter (int)`` : The number of iterations when computing entity representation. Defaults to ``1``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='KGCN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/knowledge/kgnnls.rst b/docs/source/user_guide/model/knowledge/kgnnls.rst
new file mode 100644
index 000000000..915633f47
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/kgnnls.rst
@@ -0,0 +1,86 @@
+KGNNLS
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3292500.3330836>`_
+
+**Title:** Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems
+
+**Authors:** Hongwei Wang, Fuzheng Zhang, Mengdi Zhang, Jure Leskovec, Miao Zhao, Wenjie Li, Zhongyuan Wang
+
+**Abstract:**  Knowledge graphs capture structured information and relations between a set of entities or items.
+As such knowledge graphs represent an attractive source of information that could help improve recommender systems.
+However, existing approaches in this domain rely on manual feature engineering and do not allow for an end-to-end
+training. Here we propose Knowledge-aware Graph Neural Networks with Label Smoothness regularization (KGNN-LS) to
+provide better recommendations. Conceptually, our approach computes user-specific item embeddings by first applying
+a trainable function that identifies important knowledge graph relationships for a given user. This way we transform
+the knowledge graph into a user-specific weighted graph and then apply a graph neural network to compute personalized
+item embeddings. To provide better inductive bias, we rely on label smoothness assumption, which posits that adjacent
+items in the knowledge graph are likely to have similar user relevance labels/scores. Label smoothness provides
+regularization over the edge weights and we prove that it is equivalent to a label propagation scheme on a graph.
+We also develop an efficient implementation that shows strong scalability with respect to the knowledge graph size.
+Experiments on four datasets show that our method outperforms state of the art baselines. KGNN-LS also achieves
+strong performance in cold-start scenarios where user-item interactions are sparse.
+
+.. image:: ../../../asset/kgnnls.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The initial embedding size of users, relations and entities. Defaults to ``64``.
+- ``aggregator (str)`` : The aggregator used in GNN layers. Defaults to ``'sum'``. Range in ``['sum', 'neighbor', 'concat']``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-7``.
+- ``neighbor_sample_size (int)`` : The number of neighbors to be sampled. Defaults to ``4``.
+- ``n_iter (int)`` : The number of iterations when computing entity representation. Defaults to ``1``.
+- ``ls_weight (float)`` : The label smoothness regularization weight. Defaults to ``0.5``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='KGNNLS', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/knowledge/ktup.rst b/docs/source/user_guide/model/knowledge/ktup.rst
new file mode 100644
index 000000000..b3ed4b760
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/ktup.rst
@@ -0,0 +1,103 @@
+KTUP
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3308558.3313705>`_
+
+**Title:** Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences
+
+**Authors:** Yixin Cao, Xiang Wang, Xiangnan He, Zikun hu, Tat-Seng Chua
+
+**Abstract:** Incorporating knowledge graph (KG) into recommender system
+is promising in improving the recommendation accuracy and explainability. However, existing methods largely assume that a KG is
+complete and simply transfer the "knowledge" in KG at the shallow
+level of entity raw data or embeddings. This may lead to suboptimal
+performance, since a practical KG can hardly be complete, and it is
+common that a KG has missing facts, relations, and entities. Thus,
+we argue that it is crucial to consider the incomplete nature of KG
+when incorporating it into recommender system.
+
+In this paper, we jointly learn the model of recommendation
+and knowledge graph completion. Distinct from previous KG-based
+recommendation methods, we transfer the relation information
+in KG, so as to understand the reasons that a user likes an item.
+As an example, if a user has watched several movies directed by
+(relation) the same person (entity), we can infer that the director
+relation plays a critical role when the user makes the decision, thus
+help to understand the user’s preference at a finer granularity.
+
+Technically, we contribute a new translation-based recommendation model, which specially accounts for various preferences in
+translating a user to an item, and then jointly train it with a KG
+completion model by combining several transfer schemes. Extensive experiments on two benchmark datasets show that our method
+outperforms state-of-the-art KG-based recommendation methods.
+Further analysis verifies the positive effect of joint training on both
+tasks of recommendation and KG completion, and the advantage
+of our model in understanding user preference.
+
+.. image:: ../../../asset/ktup.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``train_rec_step (int)`` : The number of steps for continuous training recommendation task. Defaults to ``5``.
+- ``train_kg_step (int)`` : The number of steps for continuous training knowledge related task. Defaults to ``5``.
+- ``embedding_size (int)`` : The embedding size of users, items, entities, relations and preferences. Defaults to ``64``.
+- ``use_st_gumbel (bool)`` : Whether to use gumbel softmax. Defaults to ``True``.
+- ``L1_flag (bool)`` : Whether to use L1 distance to calculate dissimilarity, if set to False, use L2 distance. Defaults to ``False``.
+- ``margin (float)`` : The margin in margin loss. Defaults to ``1.0``.
+- ``kg_weight (float)`` : The weight decay for kg model. Defaults to ``1.0``.
+- ``align_weight (float)`` : The align loss weight(make the item embedding in rec and kg more closer). Defaults to ``1.0``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='KTUP', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   L1_flag choice [True, False]
+   use_st_gumbel choice [True, False]
+   train_rec_step choice [8,10]
+   train_kg_step choice [0,1,2,3,4,5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/knowledge/mkr.rst b/docs/source/user_guide/model/knowledge/mkr.rst
new file mode 100644
index 000000000..62633f5fb
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/mkr.rst
@@ -0,0 +1,78 @@
+MKR
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3308558.3313411>`_
+
+**Title:** Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation
+
+**Authors:** Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
+
+**Abstract:** Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by crosscompress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that crosscompress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain satisfactory performance even if user-item interactions are sparse.
+
+.. image:: ../../../asset/mkr.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``kg_embedding_size (int)`` : The embedding size of entities, relations. Defaults to ``64``.
+- ``low_layers_num (int)`` : The number of low layers. Defaults to ``1``.
+- ``high_layers_num (int)`` : The number of high layers. Defaults to ``1``.
+- ``kge_interval (int)`` : The number of steps for continuous training knowledge related task. Defaults to ``3``.
+- ``use_inner_product (bool)`` : Whether to use inner product to calculate scores. Defaults to ``True``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-6``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.0``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='MKR', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   low_layers_num choice [1,2,3]
+   high_layers_num choice [1,2]
+   l2_weight choice [1e-6,1e-4]
+   kg_embedding_size choice [16,32,64]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/knowledge/ripplenet.rst b/docs/source/user_guide/model/knowledge/ripplenet.rst
new file mode 100644
index 000000000..f1fadbd28
--- /dev/null
+++ b/docs/source/user_guide/model/knowledge/ripplenet.rst
@@ -0,0 +1,86 @@
+RippleNet
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3269206.3271739>`_
+
+**Title:** RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems
+
+**Authors:** Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
+
+**Abstract:** To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social
+networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of
+side information. To address the limitations of existing embeddingbased and path-based methods for knowledge-graph-aware recommendation, we propose RippleNet, an end-to-end framework that
+naturally incorporates the knowledge graph into recommender
+systems. Similar to actual ripples propagating on the water, RippleNet stimulates the propagation of user preferences over the set
+of knowledge entities by automatically and iteratively extending a
+user’s potential interests along links in the knowledge graph. The
+multiple "ripples" activated by a user’s historically clicked items
+are thus superposed to form the preference distribution of the user
+with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments
+on real-world datasets, we demonstrate that RippleNet achieves
+substantial gains in a variety of scenarios, including movie, book
+and news recommendation, over several state-of-the-art baselines.
+
+.. image:: ../../../asset/ripplenet.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users, items and entities. Defaults to ``64``.
+- ``n_hop (int)`` : The number of hop reasoning for knowledge base. Defaults to ``2``.
+- ``n_memory (int)`` : The number of memory size of every hop. Defaults to ``16``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-07``,
+- ``kg_weight (float)`` : The kg loss weight. Defaults to ``0.01``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='RippleNet', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   n_memory choice [4, 8. 16. 32]
+   training_neg_sample_num choice [1, 2, 5, 10]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/bert4rec.rst b/docs/source/user_guide/model/sequential/bert4rec.rst
new file mode 100644
index 000000000..f6c6210ae
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/bert4rec.rst
@@ -0,0 +1,98 @@
+BERT4Rec
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3357384.3357895>`_
+
+**Title:** BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
+
+**Authors:** Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, Peng Jiang
+
+**Abstract:**  Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users'
+historical interactions from left to right into hidden representations
+for making recommendations. Despite their effectiveness, we argue
+that such left-to-right unidirectional models are sub-optimal due
+to the limitations including: a) unidirectional architectures restrict
+the power of hidden representation in users' behavior sequences;
+b) they often assume a rigidly ordered sequence which is not always
+practical. To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep
+bidirectional self-attention to model user behavior sequences. To
+avoid the information leakage and efficiently train the bidirectional
+model, we adopt the Cloze objective to sequential recommendation,
+predicting the random masked items in the sequence by jointly
+conditioning on their left and right context. In this way, we learn
+a bidirectional representation model to make recommendations
+by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on
+four benchmark datasets show that our model outperforms various
+state-of-the-art sequential models consistently.
+
+.. image:: ../../../asset/bert4rec.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of items. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``2``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``2``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.5``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.5``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)`` : A value added to the denominator for numerical stability. Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to ``0.02``.
+- ``mask_ratio (float)`` : The probability for a item replaced by MASK token. Defaults to ``0.2``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='BERT4Rec', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   attn_dropout_prob choice [0.2,0.5]
+   hidden_dropout_prob choice [0.2,0.5]
+   n_heads choice [1,2]
+   n_layers choice [1,2]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/caser.rst b/docs/source/user_guide/model/sequential/caser.rst
new file mode 100644
index 000000000..c502e0299
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/caser.rst
@@ -0,0 +1,79 @@
+Caser
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3159652.3159656>`_
+
+**Title:** Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
+
+**Authors:** Jiaxi Tang, Ke Wang
+
+**Abstract:**  Top-N sequential recommendation models each user as a sequence of items interacted in the past and aims to predict top-N ranked items that a user will likely interact in a “near future”. The order of interaction implies that sequential patterns play an important role where more recent items in a sequence have a larger impact on the next item. In this paper, we propose a Convolutional Sequence Embedding Recommendation Model (Caser) as a solution to address this requirement. The idea is to embed a sequence of recent items into an “image” in the time and latent spaces and learn sequential patterns as local features of the image using convolutional filters. This approach provides a unified and flexible network structure for capturing both general preferences and sequential patterns. The ex- periments on public data sets demonstrated that Caser consistently outperforms state-of-the-art sequential recommendation methods on a variety of common evaluation metrics.
+
+.. image:: ../../../asset/caser.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``n_h (int)`` : The number of horizontal Convolutional filters. Defaults to ``16``.
+- ``n_v (int)`` : The number of vertical Convolutional filters. Defaults to ``8``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-4``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.4``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='Caser', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of Caser can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   reg_weight choice [0,1e-4,1e-5]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/fdsa.rst b/docs/source/user_guide/model/sequential/fdsa.rst
new file mode 100644
index 000000000..90ee4aec5
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/fdsa.rst
@@ -0,0 +1,100 @@
+FDSA
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.ijcai.org/Proceedings/2019/600>`_
+
+**Title:** Feature-level Deeper Self-Attention Network for Sequential Recommendation
+
+**Authors:** Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, Xiaofang Zhou
+
+**Abstract:**  Sequential recommendation, which aims to recommend next item that the user will
+likely interact in a near future, has become essential in various Internet applications.
+Existing methods usually consider the transition patterns between items, but ignore the
+transition patterns between features of items. We argue that only the item-level sequences
+cannot reveal the full sequential patterns, while explicit and implicit feature-level
+sequences can help extract the full sequential patterns. In this paper, we propose a novel
+method named Feature-level Deeper Self-Attention Network (FDSA) for sequential recommendation.
+Specifically, FDSA first integrates various heterogeneous features of items into feature
+sequences with different weights through a vanilla mechanism. After that, FDSA applies
+separated self-attention blocks on item-level sequences and feature-level sequences,
+respectively, to model item transition patterns and feature transition patterns.
+Then, we integrate the outputs of these two blocks to a fully-connected layer for next item recommendation.
+Finally, comprehensive experimental results demonstrate that considering the transition relationships between
+features can significantly improve the performance of sequential recommendation.
+
+.. image:: ../../../asset/fdsa.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of items. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``2``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``2``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.5``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.5``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)`` : A value added to the denominator for numerical stability. Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to ``0.02``.
+- ``selected_features (list)`` : The list of selected item features. Defaults to ``['class']`` for ml-100k dataset.
+- ``pooling_mode (str)``: The intra-feature pooling mode. Defaults to ``'mean'``. Range in ``['max', 'mean', 'sum']``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FDSA', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- FDSA is a sequential model that integrates item context information. ``selected_features`` controls the used item context information. The used context information must be in the dataset and be loaded by data module in RecBole. It means the value in ``selected_features`` must appear in ``load_col``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   attn_dropout_prob choice [0.2, 0.5]
+   hidden_dropout_prob choice [0.2, 0.5]
+   n_heads choice [1, 2]
+   n_layers choice [1,2,3]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/fossil.rst b/docs/source/user_guide/model/sequential/fossil.rst
new file mode 100644
index 000000000..f05e5dd71
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/fossil.rst
@@ -0,0 +1,101 @@
+FOSSIL
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://ieeexplore.ieee.org/abstract/document/7837843/>`_
+
+**Title:** FOSSIL: Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation.
+
+**Authors:** Ruining He
+
+**Abstract:**  Abstract—Predicting personalized sequential behavior is a
+key task for recommender systems. In order to predict user
+actions such as the next product to purchase, movie to watch,
+or place to visit, it is essential to take into account both long-
+term user preferences and sequential patterns (i.e., short-term
+dynamics). Matrix Factorization and Markov Chain methods
+have emerged as two separate but powerful paradigms for
+modeling the two respectively. Combining these ideas has led
+to unified methods that accommodate long- and short-term
+dynamics simultaneously by modeling pairwise user-item and
+item-item interactions.
+In spite of the success of such methods for tackling dense
+data, they are challenged by sparsity issues, which are prevalent
+in real-world datasets. In recent years, similarity-based methods
+have been proposed for (sequentially-unaware) item recommendation with promising results on sparse datasets. In this
+paper, we propose to fuse such methods with Markov Chains to
+make personalized sequential recommendations. We evaluate
+our method, Fossil, on a variety of large, real-world datasets.
+We show quantitatively that Fossil outperforms alternative
+algorithms, especially on sparse datasets, and qualitatively
+that it captures personalized dynamics and is able to make
+meaningful recommendations.
+
+.. image:: ../../../asset/fossil.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``order_len (int)`` : The last N items . Defaults to ``3``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``0.00``.
+- ``alpha (float)`` : The parameter of alpha in calculate the similarity. Defaults to ``0.6``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FOSSIL', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of FOSSIL can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.001]
+   embedding_size choice [64]
+   reg_weight choice [0,0.0001]
+   order_len choice [1,2,3,5]
+   alpha choice [0.2,0.5,0.6]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/fpmc.rst b/docs/source/user_guide/model/sequential/fpmc.rst
new file mode 100644
index 000000000..6f759d8f5
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/fpmc.rst
@@ -0,0 +1,93 @@
+FPMC
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/1772690.1772773>`_
+
+**Title:** Factorizing personalized Markov chains for next-basket recommendation
+
+**Authors:** Steffen Rendle, Christoph Freudenthaler, Lars Schmidt-Thieme
+
+**Abstract:**  Recommender systems are an important component of many
+websites. Two of the most popular approaches are based on
+matrix factorization (MF) and Markov chains (MC). MF
+methods learn the general taste of a user by factorizing the
+matrix over observed user-item preferences. On the other
+hand, MC methods model sequential behavior by learning a
+transition graph over items that is used to predict the next
+action based on the recent actions of a user. In this paper, we
+present a method bringing both approaches together. Our
+method is based on personalized transition graphs over underlying Markov chains. That means for each user an own
+transition matrix is learned – thus in total the method uses
+a transition cube. As the observations for estimating the
+transitions are usually very limited, our method factorizes
+the transition cube with a pairwise interaction model which
+is a special case of the Tucker Decomposition. We show
+that our factorized personalized MC (FPMC) model subsumes both a common Markov chain and the normal matrix
+factorization model. For learning the model parameters, we
+introduce an adaption of the Bayesian Personalized Ranking
+(BPR) framework for sequential basket data. Empirically,
+we show that our FPMC model outperforms both the common matrix factorization and the unpersonalized MC model
+both learned with and without factorization.
+
+.. image:: ../../../asset/fpmc.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='FPMC', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- Different from other sequential models, FPMC must be optimized in pair-wise way using negative sampling, so it needs ``training_neg_sample_num=1``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+ 
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/gcsan.rst b/docs/source/user_guide/model/sequential/gcsan.rst
new file mode 100644
index 000000000..3707767d4
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/gcsan.rst
@@ -0,0 +1,101 @@
+GCSAN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.ijcai.org/Proceedings/2019/547>`_
+
+**Title:** Graph Contextualized Self-Attention Network for Session-based Recommendation
+
+**Authors:** Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Fuzhen Zhuang, Junhua Fang, Xiaofang Zhou
+
+**Abstract:**  Session-based recommendation, which aims to predict the user’s immediate next action based on
+anonymous sessions, is a key task in many online
+services (e:g:; e-commerce, media streaming). Recently, Self-Attention Network (SAN) has achieved
+significant success in various sequence modeling
+tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences. In this paper, we propose a graph contextualized self-attention model
+(GC-SAN), which utilizes both graph neural network and self-attention mechanism, for sessionbased recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN). Then each session learns long-range dependencies by applying
+the self-attention mechanism. Finally, each session
+is represented as a linear combination of the global
+preference and the current interest of that session.
+Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art
+methods consistently.
+
+.. image:: ../../../asset/gcsan.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of item. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``1``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``1``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.2``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.2``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)`` : A value added to the denominator for numerical stability. Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to ``0.02``.
+- ``step (int)`` : The number of layers in GNN. Defaults to ``1``.
+- ``weight (float)`` : The weight parameter controls the contribution of self-attention representation and the last-clicked action, the original paper suggests that setting w to a value of 0.4 to 0.8 is more desirable. Defaults to ``0.6``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``[5e-5]``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='GCSAN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   step choice [1]
+   n_layers choice [1]
+   n_heads choice [1]
+   hidden_size choice [64]
+   inner_size choice [256]
+   hidden_dropout_prob choice [0.2]
+   attn_dropout_prob choice [0.2]
+   hidden_act choice ['gelu']
+   layer_norm_eps choice [1e-12]
+   initializer_range choice [0.02]
+   weight choice [0.5,0.6]
+   reg_weight choice [5e-5]
+ 
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/sequential/gru4rec.rst b/docs/source/user_guide/model/sequential/gru4rec.rst
new file mode 100644
index 000000000..8139db4e6
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/gru4rec.rst
@@ -0,0 +1,83 @@
+GRU4Rec
+=================
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2988450.2988452>`_
+
+**Title:** Improved Recurrent Neural Networks for Session-based Recommendations
+
+**Authors:** Yong Kiam Tan, Xinxing Xu, Yong Liu
+
+**Abstract:**  Recurrent neural networks (RNNs) were recently proposed
+for the session-based recommendation task. The models
+showed promising improvements over traditional recommendation approaches. In this work, we further study RNNbased models for session-based recommendations. We propose the application of two techniques to improve model
+performance, namely, data augmentation, and a method to
+account for shifts in the input data distribution. We also
+empirically study the use of generalised distillation, and a
+novel alternative model that directly predicts item embeddings. Experiments on the RecSys Challenge 2015 dataset
+demonstrate relative improvements of 12.8% and 14.8% over
+previously reported results on the Recall\@20 and Mean Reciprocal Rank\@20 metrics respectively.
+
+.. image:: ../../../asset/gru4rec.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``128``.
+- ``num_layers (int)`` : The number of layers in GRU. Defaults to ``1``.
+- ``dropout_prob (float)``: The dropout rate. Defaults to ``0.3``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='GRU4Rec', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   num_layers choice [1,2,3]
+   hidden_size choice [128]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/sequential/gru4recf.rst b/docs/source/user_guide/model/sequential/gru4recf.rst
new file mode 100644
index 000000000..488546ba8
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/gru4recf.rst
@@ -0,0 +1,95 @@
+GRU4RecF
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/2959100.2959167>`_
+
+**Title:** Parallel Recurrent Neural Network Architectures for
+Feature-rich Session-based Recommendations
+
+**Authors:** Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, Domonkos Tikk
+
+**Abstract:**  Real-life recommender systems often face the daunting task
+of providing recommendations based only on the clicks of
+a user session. Methods that rely on user profiles – such
+as matrix factorization – perform very poorly in this setting, thus item-to-item recommendations are used most of
+the time. However the items typically have rich feature representations such as pictures and text descriptions that can
+be used to model the sessions. Here we investigate how these
+features can be exploited in Recurrent Neural Network based
+session models using deep learning. We show that obvious
+approaches do not leverage these data sources. We thus introduce a number of parallel RNN (p-RNN) architectures to
+model sessions based on the clicks and the features (images
+and text) of the clicked items. We also propose alternative
+training strategies for p-RNNs that suit them better than
+standard training. We show that p-RNN architectures with
+proper training have significant performance improvements
+over feature-less session models while all session-based models outperform the item-to-item type baseline.
+
+.. image:: ../../../asset/gru4recf.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``128``.
+- ``num_layers (int)`` : The number of layers in GRU. Defaults to ``1``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.3``.
+- ``selected_features (list)`` : The list of selected item features. Defaults to ``['class']`` for ml-100k dataset.
+- ``pooling_mode (str)`` : The intra-feature pooling mode. Defaults to ``'sum'``. Range in ``['max', 'mean', 'sum']``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='GRU4RecF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+
+**Notes:**
+
+- GRU4RecF is a sequential model that integrates item context information. ``selected_features`` controls the used item context information. The used context information must be in the dataset and be loaded by data module in RecBole. It means the value in ``selected_features`` must appear in ``load_col``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   num_layers choice [1, 2]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/sequential/gru4reckg.rst b/docs/source/user_guide/model/sequential/gru4reckg.rst
new file mode 100644
index 000000000..f1653cabf
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/gru4reckg.rst
@@ -0,0 +1,88 @@
+GRU4RecKG
+===========
+
+Introduction
+---------------------
+
+It is an extension of GRU4Rec, which concatenates items and its corresponding knowledge graph embedding as the input.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items and the KG feature. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``128``.
+- ``num_layers (int)`` : The number of layers in GRU. Defaults to ``1``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.1``.
+- ``freeze_kg (bool)`` : Whether to freeze the pre-trained knowledge embedding feature. Defaults to ``True``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='GRU4RecKG', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- If you want to run GRU4RecKG, please prepare pretrained knowledge graph embedding and add the following settings to config files:
+
+   .. code:: yaml
+
+        load_col:
+            inter: [user_id, item_id]
+            kg: [head_id, relation_id, tail_id]
+            link: [item_id, entity_id]
+            ent_feature: [ent_id, ent_vec]
+        fields_in_same_space: [
+            [ent_id, entity_id]
+        ]
+        preload_weight:
+            ent_id: ent_vec
+        additional_feat_suffix: [ent_feature]
+
+  where the pretrained knowledge graph embedding should be stored in file named [dataset_name].ent_feature. If you want to
+  add additional feature embedding, please refer to this example.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   num_layers choice [1,2,3]
+   hidden_size choice [128]
+   freeze_kg choice [True, False]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/hgn.rst b/docs/source/user_guide/model/sequential/hgn.rst
new file mode 100644
index 000000000..630fcb445
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/hgn.rst
@@ -0,0 +1,96 @@
+HGN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3292500.3330984>`_
+
+**Title:** HGN: Hierarchical Gating Networks for Sequential Recommendation.
+
+**Authors:** Chen Ma
+
+**Abstract:**  The chronological order of user-item interactions is a key feature
+in many recommender systems, where the items that users will
+interact may largely depend on those items that users just accessed
+recently. However, with the tremendous increase of users and items,
+sequential recommender systems still face several challenging problems: (1) the hardness of modeling the long-term user interests from
+sparse implicit feedback; (2) the difficulty of capturing the short-
+term user interests given several items the user just accessed. To
+cope with these challenges, we propose a hierarchical gating net-
+work (HGN), integrated with the Bayesian Personalized Ranking
+(BPR) to capture both the long-term and short-term user interests.
+Our HGN consists of a feature gating module, an instance gating
+module, and an item-item product module. In particular, our feature
+gating and instance gating modules select what item features can
+be passed to the downstream layers from the feature and instance
+levels, respectively. Our item-item product module explicitly captures the item relations between the items that users accessed in
+the past and those items users will access in the future. We extensively evaluate our model with several state-of-the-art methods
+and different validation metrics on five real-world datasets. The
+experimental results demonstrate the effectiveness of our model on
+Top-N sequential recommendation.
+
+.. image:: ../../../asset/hgn.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``pooling_type (str)`` : The type of pooling include average pooling and max pooling . Defaults to ``average``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``[0.00,0.00]``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='HGN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of HGN can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.001]
+   embedding_size choice [64]
+   pooling_type choice ["average","max"]
+   reg_weight choice ['[0.00,0.00]','[0.001,0.00001]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/hrm.rst b/docs/source/user_guide/model/sequential/hrm.rst
new file mode 100644
index 000000000..766d9b21a
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/hrm.rst
@@ -0,0 +1,102 @@
+HRM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/2766462.2767694>`_
+
+**Title:** HRM: Learning Hierarchical Representation Model for Next Basket Recommendation.
+
+**Authors:** Pengfei Wang
+
+**Abstract:**  Next basket recommendation is a crucial task in market bas-
+ket analysis. Given a user’s purchase history, usually a sequence of transaction data, one attempts to build a recom-
+mender that can predict the next few items that the us-
+er most probably would like. Ideally, a good recommender
+should be able to explore the sequential behavior (i.e., buy-
+ing one item leads to buying another next), as well as ac-
+count for users’ general taste (i.e., what items a user is typically interested in) for recommendation. Moreover, these
+two factors may interact with each other to influence users’
+next purchase. To tackle the above problems, in this pa-
+per, we introduce a novel recommendation approach, name-
+ly hierarchical representation model (HRM). HRM can well
+capture both sequential behavior and users’ general taste by
+involving transaction and user representations in prediction.
+Meanwhile, the flexibility of applying different aggregation
+operations, especially nonlinear operations, on representations allows us to model complicated interactions among
+different factors. Theoretically, we show that our model
+subsumes several existing methods when choosing proper
+aggregation operations. Empirically, we demonstrate that
+our model can consistently outperform the state-of-the-art
+baselines under different evaluation metrics on real-world
+transaction data.
+
+.. image:: ../../../asset/hrm.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``high_order (int)`` : The last N items . Defaults to ``2``.
+- ``pooling_type_layer_1 (str)`` : The type of pooling in the first floor include average pooling and max pooling . Defaults to ``max``.
+- ``pooling_type_layer_2 (str)`` : The type of pooling in the second floor include average pooling and max pooling . Defaults to ``max``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.2``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='HRM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of HRM can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.001]
+   embedding_size choice [64]
+   high_order choice [1,2,4]
+   dropout_prob choice [0.2]
+   pooling_type_layer_1 choice ["max","average"]
+   pooling_type_layer_2 choice ["max","average"]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/ksr.rst b/docs/source/user_guide/model/sequential/ksr.rst
new file mode 100644
index 000000000..8085e71b4
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/ksr.rst
@@ -0,0 +1,102 @@
+KSR
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3209978.3210017>`_
+
+**Title:** Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks
+
+**Authors:** Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, Edward Y. Chang
+
+**Abstract:**  With the revival of neural networks, many studies try to adapt powerful sequential neural models, ıe Recurrent Neural Networks (RNN), to sequential recommendation. RNN-based networks encode historical interaction records into a hidden state vector. Although the state vector is able to encode sequential dependency, it still has limited representation power in capturing complicated user preference. It is difficult to capture fine-grained user preference from the interaction sequence. Furthermore, the latent vector representation is usually hard to understand and explain. To address these issues, in this paper, we propose a novel knowledge enhanced sequential recommender. Our model integrates the RNN-based networks with Key-Value Memory Network (KV-MN). We further incorporate knowledge base (KB) information to enhance the semantic representation of KV-MN. RNN-based models are good at capturing sequential user preference, while knowledge-enhanced KV-MNs are good at capturing attribute-level user preference. By using a hybrid of RNNs and KV-MNs, it is expected to be endowed with both benefits from these two components. The sequential preference representation together with the attribute-level preference representation are combined as the final representation of user preference. With the incorporation of KB information, our model is also highly interpretable. To our knowledge, it is the first time that sequential recommender is integrated with external memories by leveraging large-scale KB information.
+
+.. image:: ../../../asset/ksr.jpg
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items and the KG feature. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``128``.
+- ``num_layers (int)`` : The number of layers in GRU. Defaults to ``1``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.1``.
+- ``freeze_kg (bool)`` : Whether to freeze the pre-trained knowledge embedding feature. Defaults to ``True``.
+- ``gamma (float)`` : The scaling factor used in read operation when calculating the attention weights of user preference on attributes. Defaults to ``10``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='KSR', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- If you want to run KSR, please prepare pretrained knowledge graph embedding and add the following settings to config files:
+
+   .. code:: yaml
+
+        load_col:
+            inter: [user_id, item_id]
+            kg: [head_id, relation_id, tail_id]
+            link: [item_id, entity_id]
+            ent_feature: [ent_id, ent_vec]
+            rel_feature: [rel_id, rel_vec]
+        fields_in_same_space: [
+            [ent_id, entity_id]
+            [rel_id, relation_id]
+        ]
+        preload_weight:
+            ent_id: ent_vec
+            rel_id: rel_vec
+        additional_feat_suffix: [ent_feature, rel_feature]
+
+  where the pretrained knowledge graph embedding should be stored in file named [dataset_name].ent_feature. If you want to
+  add additional feature embedding, please refer to this example.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   dropout_prob choice [0.0,0.1,0.2,0.3,0.4,0.5]
+   num_layers choice [1,2,3]
+   hidden_size choice [128]
+   freeze_kg choice [True, False]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/narm.rst b/docs/source/user_guide/model/sequential/narm.rst
new file mode 100644
index 000000000..d9bd60460
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/narm.rst
@@ -0,0 +1,92 @@
+NARM
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3132847.3132926>`_
+
+**Title:** Neural Attentive Session-based Recommendation
+
+**Authors:** Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, Jun Ma
+
+**Abstract:**  Given e-commerce scenarios that user profiles are invisible, sessionbased recommendation is proposed to generate recommendation
+results from short sessions. Previous work only considers the
+user’s sequential behavior in the current session, whereas the
+user’s main purpose in the current session is not emphasized. In
+this paper, we propose a novel neural networks framework, i.e.,
+Neural Attentive Recommendation Machine (NARM), to tackle
+this problem. Specifically, we explore a hybrid encoder with an
+attention mechanism to model the user’s sequential behavior and
+capture the user’s main purpose in the current session, which
+are combined as a unified session representation later. We then
+compute the recommendation scores for each candidate item with
+a bi-linear matching scheme based on this unified session representation. We train NARM by jointly learning the item and session
+representations as well as their matchings. We carried out extensive experiments on two benchmark datasets. Our experimental
+results show that NARM outperforms state-of-the-art baselines on
+both datasets. Furthermore, we also find that NARM achieves a
+significant improvement on long sessions, which demonstrates its
+advantages in modeling the user’s sequential behavior and main
+purpose simultaneously.
+
+.. image:: ../../../asset/narm.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``128``.
+- ``n_layers (int)`` : The number of layers in GRU. Defaults to ``1``.
+- ``dropout_probs (list of float)`` : The dropout rate, there are two values,
+  the former is for embedding layer and the latter is for concatenation of the vector obtained by the local encoder and the vector obtained by the global encoder. Defaults to ``[0.25,0.5]``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NARM', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   hidden_size choice [128]
+   n_layers choice [1,2]
+   dropout_probs choice ['[0.25,0.5]','[0.2,0.2]','[0.1,0.2]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/nextitnet.rst b/docs/source/user_guide/model/sequential/nextitnet.rst
new file mode 100644
index 000000000..2dcd97da5
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/nextitnet.rst
@@ -0,0 +1,79 @@
+NextItNet
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3289600.3290975>`_
+
+**Title:** A Simple Convolutional Generative Network for Next Item Recommendation
+
+**Authors:** Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, Xiangnan He
+
+**Abstract:**  Convolutional Neural Networks (CNNs) have been recently intro- duced in the domain of session-based next item recommendation. An ordered collection of past items the user has interacted with in a session (or sequence) are embedded into a 2-dimensional latent matrix, and treated as an image. The convolution and pooling opera- tions are then applied to the mapped item embeddings. In this paper, we first examine the typical session-based CNN recommender and show that both the generative model and network architecture are suboptimal when modeling long-range dependencies in the item sequence. To address the issues, we introduce a simple, but very effective generative model that is capable of learning high-level representation from both short- and long-range item dependencies. The network architecture of the proposed model is formed of a stack of holed convolutional layers, which can efficiently increase the receptive fields without relying on the pooling operation. Another contribution is the effective use of residual block structure in recom- mender systems, which can ease the optimization for much deeper networks. The proposed generative model attains state-of-the-art accuracy with less training time in the next item recommendation task. It accordingly can be used as a powerful recommendation baseline to beat in future, especially when there are long sequences of user feedback.
+
+.. image:: ../../../asset/nextitnet.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``kernel_size (int)`` : The width of convolutional filter. Defaults to ``3``.
+- ``block_num (int)`` : The number of residual blocks. Defaults to ``5``.
+- ``dilations (list)`` : Control the spacing between the kernel points. Defaults to ``[1,4]``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``1e-5``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NextItNet', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of NextitNet can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   reg_weight choice [0,1e-5,1e-4]
+   block_num choice [2,3,4,5]
+   dilations choice ['[1, 2]' '[1, 4]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/sequential/npe.rst b/docs/source/user_guide/model/sequential/npe.rst
new file mode 100644
index 000000000..0cf9ca655
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/npe.rst
@@ -0,0 +1,89 @@
+NPE
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://arxiv.org/abs/1805.06563>`_
+
+**Title:** NPE: Neural Personalized Embedding for Collaborative Filtering.
+
+**Authors:** Ying, H
+
+**Abstract:**  Matrix factorization is one of the most efficient approaches in recommender systems. However, such
+algorithms, which rely on the interactions between
+users and items, perform poorly for “cold-users”
+(users with little history of such interactions) and
+at capturing the relationships between closely related items. To address these problems, we propose
+a neural personalized embedding (NPE) model,
+which improves the recommendation performance
+for cold-users and can learn effective representations of items. It models a user’s click to an item
+in two terms: the personal preference of the user
+for the item, and the relationships between this
+item and other items clicked by the user. We show
+that NPE outperforms competing methods for top-
+N recommendations, specially for cold-user recommendations. We also performed a qualitative analysis that shows the effectiveness
+of the representations learned by the model.
+
+.. image:: ../../../asset/npe.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.3``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='NPE', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of NPE can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.001]
+   embedding_size choice [64]
+   dropout_prob choice [0.2,0.3,0.5]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/repeatnet.rst b/docs/source/user_guide/model/sequential/repeatnet.rst
new file mode 100644
index 000000000..47dcc5325
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/repeatnet.rst
@@ -0,0 +1,94 @@
+RepeatNet
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://ojs.aaai.org//index.php/AAAI/article/view/4408>`_
+
+**Title:** RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-based Recommendation.
+
+**Authors:** Pengjie Ren, Zhumin Chen, Jing Li, Zhaochun Ren, Jun Ma, Maarten de Rijke
+
+**Abstract:**  Recurrent neural networks for session-based recommendation have attracted a lot of attention recently because of
+their promising performance. repeat consumption is a com-mon phenomenon in many recommendation scenarios (e.g.,e-commerce, music, and TV program recommendations),
+where the same item is re-consumed repeatedly over time.
+However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach
+is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeat-explore mechanism into neural networks and propose a new model, called
+RepeatNet, with an encoder-decoder structure. RepeatNet integrates a regular neural recommendation approach in the de-
+coder with a new repeat recommendation mechanism that can
+choose items from a user’s history and recommends them at
+the right time. We report on extensive experiments on three
+benchmark datasets. RepeatNet outperforms state-of-the-art
+baselines on all three datasets in terms of MRR and Recall.
+Furthermore, as the dataset size and the repeat ratio increase,
+the improvements of RepeatNet over the baselines also in-
+crease, which demonstrates its advantage in handling repeat
+recommendation scenarios.
+
+.. image:: ../../../asset/repeatnet.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``hidden_size (int)`` : The number of features in the hidden state. Defaults to ``64``.
+- ``joint_train (bool)`` : The indicator whether the train loss should add the repeat_explore_loss. Defaults to ``False``.
+- ``dropout_prob (float)`` : The dropout rate. Defaults to ``0.5``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='RepeatNet', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of RepeatNet can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.001,]
+   embedding_size choice [64]
+   joint_train choice [False,True]
+   dropout_prob choice [0.5,]
+   train_batch_size: 2048
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/s3rec.rst b/docs/source/user_guide/model/sequential/s3rec.rst
new file mode 100644
index 000000000..ee24b0f43
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/s3rec.rst
@@ -0,0 +1,138 @@
+S3Rec
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3340531.3411954>`_
+
+**Title:** S^3-Rec: Self-Supervised Learning for Sequential
+Recommendation with Mutual Information Maximization
+
+**Authors:** Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, Ji-Rong Wen
+
+**Abstract:**  Recently, significant progress has been made in sequential recommendation with deep learning. Existing neural sequential recommendation models usually rely on the item prediction loss to learn
+model parameters or data representations. However, the model
+trained with this loss is prone to suffer from data sparsity problem.
+Since it overemphasizes the final performance, the association or
+fusion between context data and sequence data has not been well
+captured and utilized for sequential recommendation.
+To tackle this problem, we propose the model S3-Rec, which
+stands for Self-Supervised learning for Sequential Recommendation,
+based on the self-attentive neural architecture. The main idea of
+our approach is to utilize the intrinsic data correlation to derive
+self-supervision signals and enhance the data representations via
+pre-training methods for improving sequential recommendation.
+For our task, we devise four auxiliary self-supervised objectives
+to learn the correlations among attribute, item, subsequence, and
+sequence by utilizing the mutual information maximization (MIM)
+principle. MIM provides a unified way to characterize the correlation between different types of data, which is particularly suitable
+in our scenario. Extensive experiments conducted on six real-world
+datasets demonstrate the superiority of our proposed method over
+existing state-of-the-art methods, especially when only limited
+training data is available. Besides, we extend our self-supervised
+learning method to other recommendation models, which also improve their performance.
+
+.. image:: ../../../asset/s3rec.png
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of item. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``2``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``2``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.5``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.5``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)``: a value added to the denominator for numerical stability. Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to ``0.02``.
+- ``mask_ratio (float)`` : The probability for a item replaced by MASK token. Defaults to ``0.2``.
+- ``aap_weight (float)`` : The weight for Associated Attribute Prediction loss. Defaults to ``1.0``.
+- ``mip_weight (float)`` : The weight for Masked Item Prediction loss. Defaults to ``0.2``.
+- ``map_weight (float)`` : The weight for Masked Attribute Prediction loss. Defaults to ``1.0``.
+- ``sp_weight (float)`` : The weight for Segment Prediction loss. Defaults to ``0.5``.
+- ``train_stage (float)`` : The training stage. Defaults to ``'pretrain'``. Range in ``['pretrain', 'finetune']``.
+- ``item_attribute (str)`` : The item features used as attributes for pre-training. Defaults to ``'class'`` for ml-100k dataset.
+- ``save_step (int)`` : Save pre-trained model every ``save_step`` pre-training epochs. Defaults to ``10``.
+- ``pre_model_path (float)`` : The path of pretrained model. Defaults to ``''``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+1. Run pre-training. Write the following code to `run_pretrain.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   config_dict = {
+       'train_stage': 'pretrain',
+       'save_step': 10,
+   }
+   run_recbole(model='S3Rec', dataset='ml-100k',
+        config_dict=config_dict, saved=False)
+
+And then:
+
+.. code:: bash
+
+   python run_pretrain.py
+
+2. Run fine-tuning. Write the following code to `run_finetune.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   config_dict = {
+       'train_stage': 'finetune',
+       'pre_model_path': './saved/S3Rec-ml-100k-100.pth',
+   }
+   run_recbole(model='S3Rec', dataset='ml-100k',
+        config_dict=config_dict)
+
+And then:
+
+.. code:: bash
+
+   python run_finetune.py
+
+
+**Notes:**
+
+- In the pre-training stage, the pre-trained model would be saved every 10 epochs, named as ``S3Rec-[dataset_name]-[pretrain_epochs].pth`` (e.g. S3Rec-ml-100k-100.pth) and saved to ``./saved/``.
+
+- In the fine-tuning stage, please make sure that the pre-trained model path is existed.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   pretrain_epochs choice [50, 100, 150]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model/sequential/sasrec.rst b/docs/source/user_guide/model/sequential/sasrec.rst
new file mode 100644
index 000000000..b67d2f3f0
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/sasrec.rst
@@ -0,0 +1,97 @@
+SASRec
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://ieeexplore.ieee.org/document/8594844/>`_
+
+**Title:** Self-Attentive Sequential Recommendation
+
+**Authors:** Wang-Cheng Kang, Julian McAuley
+
+**Abstract:**  Sequential dynamics are a key feature of many modern recommender systems,
+which seek to capture the 'context' of users' activities on the basis of actions they have
+performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs)
+and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be
+predicted on the basis of just their last (or last few) actions, while RNNs in principle allow
+for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in
+extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser
+datasets where higher model complexity is affordable. The goal of our work is to balance these
+two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture
+long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based
+on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items
+are 'relevant' from a user's action history, and use them to predict the next item. Extensive
+empirical studies show that our method outperforms various state-of-the-art sequential
+models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets.
+Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models.
+Visualizations on attention weights also show how our model adaptively handles datasets with
+various density, and uncovers meaningful patterns in activity sequences.
+
+.. image:: ../../../asset/sasrec.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of item. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``2``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``2``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.5``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.5``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)`` : A value added to the denominator for numerical stability, Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to 0.02``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='SASRec', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   attn_dropout_prob choice [0.2, 0.5]
+   hidden_dropout_prob choice [0.2, 0.5]
+   n_heads choice [1, 2]
+   n_layers choice [1,2,3]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/sasrecf.rst b/docs/source/user_guide/model/sequential/sasrecf.rst
new file mode 100644
index 000000000..ecf8bab8a
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/sasrecf.rst
@@ -0,0 +1,77 @@
+SASRecF
+===========
+
+Introduction
+---------------------
+
+It is an extension of SASRec, which concatenates items and items' features as the input.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``hidden_size (int)`` : The number of features in the hidden state. It is also the initial embedding size of items. Defaults to ``64``.
+- ``inner_size (int)`` : The inner hidden size in feed-forward layer. Defaults to ``256``.
+- ``n_layers (int)`` : The number of transformer layers in transformer encoder. Defaults to ``2``.
+- ``n_heads (int)`` : The number of attention heads for multi-head attention layer. Defaults to ``2``.
+- ``hidden_dropout_prob (float)`` : The probability of an element to be zeroed. Defaults to ``0.5``.
+- ``attn_dropout_prob (float)`` : The probability of an attention score to be zeroed. Defaults to ``0.5``.
+- ``hidden_act (str)`` : The activation function in feed-forward layer. Defaults to ``'gelu'``. Range in ``['gelu', 'relu', 'swish', 'tanh', 'sigmoid']``.
+- ``layer_norm_eps (float)`` : A value added to the denominator for numerical stability. Defaults to ``1e-12``.
+- ``initializer_range (float)`` : The standard deviation for normal initialization. Defaults to ``0.02``.
+- ``selected_features (list)`` : The list of selected item features. Defaults to ``['class']`` for ml-100k dataset.
+- ``pooling_mode (str)`` : intra-feature pooling mode. Defaults to ``'sum'``. Range in ``['max', 'mean', 'sum']``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='SASRecF', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- SASRecF is a sequential model that integrates item context information. ``selected_features`` controls the used item context information. The used context information must be in the dataset and be loaded by data module in RecBole. It means the value in ``selected_features`` must appear in ``load_col``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   attn_dropout_prob choice [0.2, 0.5]
+   hidden_dropout_prob choice [0.2, 0.5]
+   n_heads choice [1, 2]
+   n_layers choice [1,2,3]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
\ No newline at end of file
diff --git a/docs/source/user_guide/model/sequential/shan.rst b/docs/source/user_guide/model/sequential/shan.rst
new file mode 100644
index 000000000..6e97a1b0d
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/shan.rst
@@ -0,0 +1,96 @@
+SHAN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://opus.lib.uts.edu.au/handle/10453/126040>`_
+
+**Title:** SHAN: Sequential Recommender System based on Hierarchical Attention Network.
+
+**Authors:** Ying, H
+
+**Abstract:**  With a large amount of user activity data accumulated, it is crucial to exploit user sequential behavior for sequential recommendations. Convention-
+ally, user general taste and recent demand are combined to promote recommendation performances.
+However, existing methods often neglect that user
+long-term preference keep evolving over time, and
+building a static representation for user general
+taste may not adequately reflect the dynamic characters. Moreover, they integrate user-item or item-
+item interactions through a linear way which lim-
+its the capability of model. To this end, in this
+paper, we propose a novel two-layer hierarchical
+attention network, which takes the above proper-
+ties into account, to recommend the next item user
+might be interested. Specifically, the first attention
+layer learns user long-term preferences based on
+the historical purchased item representation, while
+the second one outputs final user representation
+through coupling user long-term and short-term
+preferences. The experimental study demonstrates
+the superiority of our method compared with other
+state-of-the-art ones.
+
+.. image:: ../../../asset/shan.jpg
+    :width: 600
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of users and items. Defaults to ``64``.
+- ``short_item_length (int)`` : The last N items . Defaults to ``2``.
+- ``reg_weight (float)`` : The L2 regularization weight. Defaults to ``[0.01,0.0001]``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='SHAN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- By setting ``reproducibility=False``, the training speed of SHAN can be greatly accelerated.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.001]
+   embedding_size choice [64]
+   short_item_length choice [1,2,4,8]
+   reg_weight choice ['[0.0,0.0]','[0.01,0.0001]']
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
diff --git a/docs/source/user_guide/model/sequential/srgnn.rst b/docs/source/user_guide/model/sequential/srgnn.rst
new file mode 100644
index 000000000..64bdfb9a4
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/srgnn.rst
@@ -0,0 +1,84 @@
+SRGNN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://www.aaai.org/ojs/index.php/AAAI/article/view/3804>`_
+
+**Title:** Session-based Recommendation with Graph Neural Networks
+
+**Authors:** Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, Wei-Ying Ma
+
+**Abstract:**  The problem of session-based recommendation aims to predict user actions based on anonymous sessions. Previous
+methods model a session as a sequence and estimate user representations besides item representations to make recommendations. Though achieved promising results, they are insufficient to obtain accurate user vectors in sessions and neglect
+complex transitions of items. To obtain accurate item embedding and take complex transitions of items into account, we
+propose a novel method, i.e. Session-based Recommendation
+with Graph Neural Networks, SR-GNN for brevity. In the
+proposed method, session sequences are modeled as graphstructured data. Based on the session graph, GNN can capture complex transitions of items, which are difficult to be
+revealed by previous conventional sequential methods. Each
+session is then represented as the composition of the global
+preference and the current interest of that session using an
+attention network. Extensive experiments conducted on two
+real datasets show that SR-GNN evidently outperforms the
+state-of-the-art session-based recommendation methods consistently.
+
+
+.. image:: ../../../asset/srgnn.png
+    :width: 700
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+- ``step (int)`` : The number of layers in GNN. Defaults to ``1``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='SRGNN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   step choice [1, 2]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
+
diff --git a/docs/source/user_guide/model/sequential/stamp.rst b/docs/source/user_guide/model/sequential/stamp.rst
new file mode 100644
index 000000000..448c68745
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/stamp.rst
@@ -0,0 +1,87 @@
+STAMP
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/abs/10.1145/3219819.3219950>`_
+
+**Title:** STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation
+
+**Authors:** Qiao Liu, Yifu Zeng, Refuoe Mokhosi, Haibin Zhang
+
+**Abstract:**  Predicting users’ actions based on anonymous sessions is a challenging problem in web-based behavioral modeling research, mainly
+due to the uncertainty of user behavior and the limited information.
+Recent advances in recurrent neural networks have led to promising
+approaches to solving this problem, with long short-term memory
+model proving effective in capturing users’ general interests from
+previous clicks. However, none of the existing approaches explicitly
+take the effects of users’ current actions on their next moves into
+account. In this study, we argue that a long-term memory model
+may be insufficient for modeling long sessions that usually contain
+user interests drift caused by unintended clicks. A novel short-term
+attention/memory priority model is proposed as a remedy, which is
+capable of capturing users’ general interests from the long-term memory of a session context, whilst taking into account users’ current
+interests from the short-term memory of the last-clicks. The validity
+and efficacy of the proposed attention mechanism is extensively
+evaluated on three benchmark data sets from the RecSys Challenge
+2015 and CIKM Cup 2016. The numerical results show that our
+model achieves state-of-the-art performance in all the tests.
+
+.. image:: ../../../asset/stamp.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+- ``loss_type (str)`` : The type of loss function. If it set to ``'CE'``, the training task is regarded as a multi-classification task and the target item is the ground truth. In this way, negative sampling is not needed. If it set to ``'BPR'``, the training task will be optimized in the pair-wise way, which maximize the difference between positive item and negative item. In this way, negative sampling is necessary, such as setting ``training_neg_sample_num = 1``. Defaults to ``'CE'``. Range in ``['BPR', 'CE']``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='STAMP', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
+
+
diff --git a/docs/source/user_guide/model/sequential/transrec.rst b/docs/source/user_guide/model/sequential/transrec.rst
new file mode 100644
index 000000000..5e6839b5d
--- /dev/null
+++ b/docs/source/user_guide/model/sequential/transrec.rst
@@ -0,0 +1,84 @@
+TransRec
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/10.1145/3109859.3109882>`_
+
+**Title:** Translation-based Recommendation
+
+**Authors:** Ruining He, Wang-Cheng Kang, Julian McAuley
+
+**Abstract:**  Modeling the complex interactions between users and items as well
+as amongst items themselves is at the core of designing successful recommender systems.
+One classical setting is predicting users' personalized sequential behavior (or 'next-item'
+recommendation), where the challenges mainly lie in modeling 'third-order' interactions
+between a user, her previously visited item(s), and the next item to consume. Existing
+methods typically decompose these higher-order interactions into a combination
+of pairwise relationships, by way of which user preferences (user-item interactions)
+and sequential patterns (item-item interactions) are captured by separate components.
+In this paper, we propose a unified method, TransRec, to model such third-order relationships
+for large-scale sequential prediction. Methodologically, we embed items into a
+'transition space' where users are modeled as translation vectors operating on
+item sequences. Empirically, this approach outperforms the state-of-the-art on
+a wide spectrum of real-world datasets.
+
+.. image:: ../../../asset/transrec.png
+    :width: 500
+    :align: center
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``embedding_size (int)`` : The embedding size of items. Defaults to ``64``.
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='TransRec', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+**Notes:**
+
+- Different from other sequential models, TransRec must be optimized in pair-wise way using negative sampling, so it needs ``training_neg_sample_num=1``.
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   learning_rate choice [0.01,0.005,0.001,0.0005,0.0001]
+   train_batch_size choice [512, 1024, 2048]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/evaluation_support`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model_intro.rst b/docs/source/user_guide/model_intro.rst
new file mode 100644
index 000000000..5be3c657d
--- /dev/null
+++ b/docs/source/user_guide/model_intro.rst
@@ -0,0 +1,106 @@
+Model Introduction
+=====================
+We implement 65 recommendation models covering general recommendation, sequential recommendation,
+context-aware recommendation and knowledge-based recommendation. A brief introduction to these models are as follows:
+
+
+General Recommendation
+--------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   model/general/pop
+   model/general/itemknn
+   model/general/bpr
+   model/general/neumf
+   model/general/convncf
+   model/general/dmf
+   model/general/fism
+   model/general/nais
+   model/general/spectralcf
+   model/general/gcmc
+   model/general/ngcf
+   model/general/lightgcn
+   model/general/dgcf
+   model/general/line
+   model/general/multivae
+   model/general/multidae
+   model/general/macridvae
+   model/general/cdae
+   model/general/enmf
+   model/general/nncf
+
+
+Context-aware Recommendation
+-------------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   model/context/lr
+   model/context/fm
+   model/context/nfm
+   model/context/deepfm
+   model/context/xdeepfm
+   model/context/afm
+   model/context/ffm
+   model/context/fwfm
+   model/context/fnn
+   model/context/pnn
+   model/context/dssm
+   model/context/widedeep
+   model/context/din
+   model/context/dcn
+   model/context/autoint
+   model/context/xgboost
+
+
+Sequential Recommendation
+---------------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   model/sequential/fpmc
+   model/sequential/gru4rec
+   model/sequential/narm
+   model/sequential/stamp
+   model/sequential/caser
+   model/sequential/nextitnet
+   model/sequential/transrec
+   model/sequential/sasrec
+   model/sequential/bert4rec
+   model/sequential/srgnn
+   model/sequential/gcsan
+   model/sequential/gru4recf
+   model/sequential/sasrecf
+   model/sequential/fdsa
+   model/sequential/s3rec
+   model/sequential/gru4reckg
+   model/sequential/ksr
+   model/sequential/fossil
+   model/sequential/shan
+   model/sequential/repeatnet
+   model/sequential/hgn
+   model/sequential/hrm
+   model/sequential/npe
+
+
+
+Knowledge-based Recommendation
+---------------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   model/knowledge/cke
+   model/knowledge/cfkg
+   model/knowledge/ktup
+   model/knowledge/kgat
+   model/knowledge/ripplenet
+   model/knowledge/mkr
+   model/knowledge/kgcn
+   model/knowledge/kgnnls
+
+
diff --git a/docs/source/user_guide/usage.rst b/docs/source/user_guide/usage.rst
new file mode 100644
index 000000000..f98dd081a
--- /dev/null
+++ b/docs/source/user_guide/usage.rst
@@ -0,0 +1,14 @@
+Usage
+===================
+Here we introduce how to use RecBole.
+
+.. toctree::
+   :maxdepth: 1
+
+   usage/run_recbole
+   usage/use_modules
+   usage/parameter_tuning
+   usage/running_new_dataset
+   usage/running_different_models
+   usage/qa
+   usage/load_pretrained_embedding
\ No newline at end of file
diff --git a/docs/source/user_guide/usage/load_pretrained_embedding.rst b/docs/source/user_guide/usage/load_pretrained_embedding.rst
new file mode 100644
index 000000000..c8f3c010a
--- /dev/null
+++ b/docs/source/user_guide/usage/load_pretrained_embedding.rst
@@ -0,0 +1,44 @@
+Load Pre-trained Embedding
+===========================
+ 
+For users who want to use pre-trained user(item) embedding to train their model. We provide a simple way as following.
+
+Firstly, prepare your additional embedding feature file, which contain at least two columns (id & embedding vector) as following format and name it as ``dataset.suffix`` (e.g: ``ml-1m.useremb``).
+
+=============   ===============================
+uid:token           user_emb:float_seq
+=============   ===============================
+1               -115.08 13.60 113.69
+2               -130.97 263.05 -129.88
+=============   ===============================
+
+Note that here the header of user id must be different from user id in your ``.user`` file or ``.inter`` file (e.g: if the header of user id in ``.user`` or ``.inter`` file is ``user_id:token``, the header of user id in your additional embedding feature file must be different. It can be either ``uid:token`` or ``userid:token``).
+
+Secondly, update the args as (suppose that ``USER_ID_FIELD: user_id``):
+ 
+.. code:: yaml
+
+    additional_feat_suffix: [useremb]
+    load_col:
+        # inter/user/item/...: As usual
+        useremb: [uid, user_emb]
+    fields_in_same_space: [[uid, user_id]]
+    preload_weight: 
+    	uid: user_emb
+
+Then, this additional embedding feature file will be loaded into the :class:`Dataset` object. These new features can be accessed as following:
+
+.. code:: python
+
+    dataset = create_dataset(config)
+    print(dataset.useremb_feat)
+
+In your model, user embedding matrix can be initialized by your pre-trained embedding vectors as following:
+
+.. code:: python
+
+    class YourModel(GeneralRecommender):
+        def __init__(self, config, dataset):
+        	pretrained_user_emb = dataset.get_preload_weight('uid')
+        	self.user_embedding = nn.Embedding.from_pretrained(torch.from_numpy(pretrained_user_emb))
+
diff --git a/docs/source/user_guide/usage/parameter_tuning.rst b/docs/source/user_guide/usage/parameter_tuning.rst
new file mode 100644
index 000000000..5c8168432
--- /dev/null
+++ b/docs/source/user_guide/usage/parameter_tuning.rst
@@ -0,0 +1,148 @@
+Parameter Tuning
+=====================
+RecBole is featured in the capability of automatic parameter
+(or hyper-parameter) tuning. One can readily optimize
+a given model according to the provided hyper-parameter spaces.
+
+The general steps are given as follows:
+
+To begin with, the user has to claim an
+:class:`~recbole.trainer.hyper_tuning.HyperTuning`
+instance in the running python file (e.g., `run.py`):
+
+.. code:: python
+
+    from recbole.trainer import HyperTuning
+    from recbole.quick_start import objective_function
+
+    hp = HyperTuning(objective_function=objective_function, algo='exhaustive',
+                    params_file='model.hyper', fixed_config_file_list=['example.yaml'])
+
+:attr:`objective_function`　is the optimization objective,
+the input of :attr:`objective_function` is the parameter,
+and the output is the optimal result of these parameters.
+The users can design this :attr:`objective_function` according to their own requirements.
+The user can also use an encapsulated :attr:`objective_function`, that is:
+
+.. code:: python
+
+    def objective_function(config_dict=None, config_file_list=None):
+
+        config = Config(config_dict=config_dict, config_file_list=config_file_list)
+        init_seed(config['seed'])
+        dataset = create_dataset(config)
+        train_data, valid_data, test_data = data_preparation(config, dataset)
+        model = get_model(config['model'])(config, train_data).to(config['device'])
+        trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
+        best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, verbose=False)
+        test_result = trainer.evaluate(test_data)
+
+        return {
+            'best_valid_score': best_valid_score,
+            'valid_score_bigger': config['valid_metric_bigger'],
+            'best_valid_result': best_valid_result,
+            'test_result': test_result
+        }
+
+:attr:`algo` is the optimization algorithm. RecBole realize this module based
+on hyperopt_. In addition, we also support grid search tunning method.
+
+.. code:: python
+
+    from hyperopt import tpe
+
+    # hyperopt 自带的优化算法
+    hp1 = HyperTuning(algo=tpe.suggest)
+
+    # Grid Search
+    hp2 = HyperTuning(algo='exhaustive')
+
+:attr:`params_file` is the ranges of the parameters, which is exampled as
+(e.g., `model.hyper`):
+
+.. code:: none
+
+    learning_rate loguniform -8,0
+    embedding_size choice [64,96,128]
+    mlp_hidden_size choice ['[64,64,64]','[128,128]']
+
+Each line represent a parameter and the corresponding search range.
+There are three components: parameter name, range type, range.
+
+:class:`~recbole.trainer.hyper_tuning.HyperTuning` supports four range types,
+the details are as follows:
+
++----------------+---------------------------------+------------------------------------------------------------------+
+| range type　   | 　　 range　　　　　　　　　　  | 　　 discription                                                 |
++================+=================================+==================================================================+
+| choice         | options(list)                   | search in options                                                |
++----------------+---------------------------------+------------------------------------------------------------------+
+| uniform        | low(int),high(int)              | search in uniform distribution: (low,high)                       |
++----------------+---------------------------------+------------------------------------------------------------------+
+| loguniform     | low(int),high(int)              | search in uniform distribution: exp(uniform(low,high))           |
++----------------+---------------------------------+------------------------------------------------------------------+
+| quniform       | low(int),high(int),q(int)       | search in uniform distribution: round(uniform(low,high)/q)*q     |
++----------------+---------------------------------+------------------------------------------------------------------+
+
+It should be noted that if the parameters are list and the range type is choice,
+then the inner list should be quoted, e.g., :attr:`mlp_hidden_size` in `model.hyper`.
+
+.. _hyperopt: https://github.com/hyperopt/hyperopt
+
+:attr:`fixed_config_file_list` is the fixed parameters, e.g., dataset related parameters and evaluation parameters.
+These parameters should be aligned with the format in :attr:`config_file_list`. See details as :doc:`../config_settings`.
+
+Calling method of HyperTuning like:
+
+.. code:: python
+
+    from recbole.trainer import HyperTuning
+    from recbole.quick_start import objective_function
+
+    hp = HyperTuning(objective_function=objective_function, algo='exhaustive',
+                    params_file='model.hyper', fixed_config_file_list=['example.yaml'])
+
+    # run
+    hp.run()
+    # export result to the file
+    hp.export_result(output_file='hyper_example.result')
+    # print best parameters
+    print('best params: ', hp.best_params)
+    # print best result
+    print('best result: ')
+    print(hp.params2result[hp.params2str(hp.best_params)])
+
+Run like:
+
+.. code:: bash
+
+    python run.py --dataset=[dataset_name] --model=[model_name]
+
+:attr:`dataset_name` is the dataset name, :attr:`model_name` is the model name, which can be controlled by the command line or the yaml configuration files.
+
+For example:
+
+.. code:: yaml
+
+    dataset: ml-100k
+    model: BPR
+
+A simple example is to search the :attr:`learning_rate` and :attr:`embedding_size` in BPR, that is,
+
+.. code:: bash
+
+    running_parameters:
+    {'embedding_size': 128, 'learning_rate': 0.005}
+    current best valid score: 0.3795
+    current best valid result:
+    {'recall@10': 0.2008, 'mrr@10': 0.3795, 'ndcg@10': 0.2151, 'hit@10': 0.7306, 'precision@10': 0.1466}
+    current test result:
+    {'recall@10': 0.2186, 'mrr@10': 0.4388, 'ndcg@10': 0.2591, 'hit@10': 0.7381, 'precision@10': 0.1784}
+
+    ...
+
+    best params:  {'embedding_size': 64, 'learning_rate': 0.001}
+    best result: {
+        'best_valid_result': {'recall@10': 0.2169, 'mrr@10': 0.4005, 'ndcg@10': 0.235, 'hit@10': 0.7582, 'precision@10': 0.1598}
+        'test_result': {'recall@10': 0.2368, 'mrr@10': 0.4519, 'ndcg@10': 0.2768, 'hit@10': 0.7614, 'precision@10': 0.1901}
+    }
diff --git a/docs/source/user_guide/usage/qa.rst b/docs/source/user_guide/usage/qa.rst
new file mode 100644
index 000000000..f088c6802
--- /dev/null
+++ b/docs/source/user_guide/usage/qa.rst
@@ -0,0 +1,36 @@
+Clarifications on some practical issues
+=========================================
+
+**Q1**:
+
+Why the result of ``Dataset.item_num`` always one plus of the actual number of items in the dataset?
+
+**A1**:
+
+We add ``[PAD]`` for all the token like fields. Thus after remapping ID, ``0`` will be reserved for ``[PAD]``, which makes the result of ``Dataset.item_num`` more than the actual number.
+
+Note that for Knowledge-based models, we add one more relation called ``U-I Relation``. It describes the history interactions which will be used in :meth:`recbole.data.dataset.kg_dataset.KnowledgeBasedDataset.ckg_graph`.
+Thus the result of ``KGDataset.relation_num`` is two more than the actual number of relations.
+
+**Q2**:
+
+Why are the test results usually better than the best valid results?
+
+**A2**:
+
+For more rigorous evaluation, those user-item interaction records in validation sets will not be ranked while testing.
+Thus the distribution of validation & test sets may be inconsistent.
+
+However, this doesn't affect the comparison between models.
+
+**Q3**
+
+Why do I receive a warning about ``batch_size changed``? What is the meaning of :attr:`batch_size` in dataloader?
+
+**A3**
+
+In RecBole's dataloader, the meaning of :attr:`batch_size` is the upper bound of the number of **interactions** in one single batch.
+
+On the one hand, it's easy to calculate and control the usage of GPU memories. E.g., while comparing between different datasets, you don't need to change the value of :attr:`batch_size`, because the usage of GPU memories will not change a lot.
+
+On the other hand, in RecBole's top-k evaluation, we need the interactions of each user grouped in one batch. In other words, the interactions of any user should not be separated into multiple batches. We try to feed more interactions into one batch, but due to the above rules, the :attr:`batch_size` is just an upper bound. And :meth:`_batch_size_adaptation` is designed to adapt the actual batch size dynamically. Thus, while executing :meth:`_batch_size_adaptation`, you will receive a warning message.
diff --git a/docs/source/user_guide/usage/run_recbole.rst b/docs/source/user_guide/usage/run_recbole.rst
new file mode 100644
index 000000000..90b979481
--- /dev/null
+++ b/docs/source/user_guide/usage/run_recbole.rst
@@ -0,0 +1,39 @@
+Use run_recbole
+==========================
+We enclose the training and evaluation processes in the api of
+:func:`~recbole.quick_start.quick_start.run_recbole`,
+which is composed of: dataset loading, dataset splitting, model initialization,
+model training and model evaluation.
+
+If this process can satisfy your requirement, you can recall this api to use
+RecBole.
+
+You can create a python file (e.g., `run.py` ), and write the following code
+into the file.
+
+.. code:: python
+
+    from recbole.quick_start import run_recbole
+
+    run_recbole(dataset=dataset, model=model, config_file_list=config_file_list, config_dict=config_dict)
+
+:attr:`dataset` is the name of the data, such as 'ml-100k',
+:attr:`model` indicates the model name, such as 'BPR'.
+
+:attr:`config_file_list` indicates the configuration files,
+:attr:`config_dict` is the parameter dict.
+The two variables are used to config parameters in our toolkit.
+If you do not want to use the two variables to config parameters,
+please ignore them. In addition, we can also support ot control the parameters
+by the command line.
+
+Please refer to :doc:`../config_settings` for more details about config settings.
+
+Then execute the following command to run:：
+
+.. code:: bash
+
+    python run.py --[param_name]=[param_value]
+
+`--[param_name]=[param_value]` is the way to control parameters by
+the command line.
diff --git a/docs/source/user_guide/usage/running_different_models.rst b/docs/source/user_guide/usage/running_different_models.rst
new file mode 100644
index 000000000..e262a940d
--- /dev/null
+++ b/docs/source/user_guide/usage/running_different_models.rst
@@ -0,0 +1,160 @@
+Running Different Models
+==========================
+Here, we present how to run different models in RecBole.
+
+Proper Parameters Configuration
+----------------------------------
+Since different categories of models have different requirements for data
+processing and evaluation setting, we need to configure these settings
+appropriately.
+
+The following will introduce the parameter configuration of these four
+categories of models: namely general recommendation, context-aware
+recommendation, sequential recommendation and knowledge-based recommendation.
+
+General Recommendation
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**specify and load the user and item columns**
+
+General recommendation models utilize the historical interactions between
+users and items to make recommendations, so it needs to specify and load the
+user and item columns of the dataset.
+
+.. code:: yaml
+
+    USER_ID_FIELD: user_id
+    ITEM_ID_FIELD: item_id
+    load_col:
+        inter: [user_id, item_id]
+
+For some dataset, the column names corresponding to user and item in atomic
+files may not be `user_id` and `item_id`. Just replace them with the
+corresponding column names.
+
+**training and evaluation settings**
+
+General recommendation models usually needs to group data by user and perform
+negative sampling.
+
+.. code:: yaml
+
+    group_by_user: True
+    training_neg_sample_num: 1
+
+Context-aware Recommendation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+**load the feature columns**
+
+Context-aware recommendation models utilize the features of users, items and
+interactions to make CTR predictions, so it needs to load the used features.
+
+.. code:: yaml
+
+    load_col:
+        inter: [inter_feature1, inter_feature2]
+        item: [item_feature1, item_feature2]
+        user: [user_feature1, user_feature2]
+
+`inter_feature1` refers to the column name of the corresponding feature in the
+inter atomic file.
+
+**label setting**
+
+We also need to configure `LABEL_FIELD`, which represents the label column in
+the CTR prediction. For the Context-aware recommendation models, the setting of
+`LABEL_FIELD` is divided into two cases:
+
+1) There is a label field in atomic file, and the value is in 0/1, we only need to
+set as follows:
+
+.. code:: yaml
+
+    LABEL_FIELD: label
+
+2) There is no label field in atomic file, we need to generate label field based
+on some information.
+
+.. code:: yaml
+
+    LABEL_FIELD: label
+    threshold:
+        rating: 3
+
+`rating` is a column in atomic file and is loaded (by ``load_col``). In this way,
+the label of the interaction with ``rating >= 3`` is set to 1, the reset are
+set to 0.
+
+**training and evaluation settings**
+
+Context-aware recommendation models usually does not need to group data by user and
+perform negative sampling.
+
+.. code:: yaml
+
+    group_by_user: False
+    training_neg_sample_num: 0
+
+Since there is no need to rank the results, ``eval_setting`` only needs to set
+the first part, for example:
+
+.. code:: yaml
+
+    eval_setting: RO_RS
+
+The evaluation metrics are generally set to `AUC` and `LogLoss`.
+
+.. code:: yaml
+
+    metrics: ['AUC', 'LogLoss']
+
+
+Sequential Recommendation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+**specify and load the user, item and time columns**
+
+Sequential recommendation models utilize the historical interaction sequences
+to predict hte next item, so it needs to specify and load the user, item and
+time columns of the dataset.
+
+.. code:: yaml
+
+    USER_ID_FIELD: user_id
+    ITEM_ID_FIELD: item_id
+    TIME_FIELD: timestamp
+    load_col:
+        inter: [user_id, item_id, timestamp]
+
+For some dataset, the column names corresponding to user, item and time in
+atomic files may not be `user_id`, `item_id` and `timestamp`, just replace them
+with the corresponding column names.
+
+**maximum length of the sequence**
+
+The maximum length of the sequence can be modified by setting
+``MAX_ITEM_LIST_LENGTH``
+
+.. code:: yaml
+
+    MAX_ITEM_LIST_LENGTH: 50
+
+Knowledge-based Recommendation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+**specify and load the kg entity columns**
+
+Knowledge-based recommendation models utilize KG information to make
+recommendations, so it needs to specify and load the kg information of the dataset.
+
+.. code:: yaml
+
+    USER_ID_FIELD: user_id
+    ITEM_ID_FIELD: item_id
+    HEAD_ENTITY_ID_FIELD: head_id
+    TAIL_ENTITY_ID_FIELD: tail_id
+    RELATION_ID_FIELD: relation_id
+    ENTITY_ID_FIELD: entity_id
+    load_col:
+        inter: [user_id, item_id]
+        kg: [head_id, relation_id, tail_id]
+        link: [item_id, entity_id]
diff --git a/docs/source/user_guide/usage/running_new_dataset.rst b/docs/source/user_guide/usage/running_new_dataset.rst
new file mode 100644
index 000000000..b986567f5
--- /dev/null
+++ b/docs/source/user_guide/usage/running_new_dataset.rst
@@ -0,0 +1,131 @@
+Running New Dataset
+=======================
+Here, we present how to use a new dataset in RecBole.
+
+
+Convert to Atomic Files
+-------------------------
+
+If the user use the collected datasets, she can choose one of the following ways:
+
+1. Download the converted atomic files from `Google Drive <https://drive.google.com/drive/folders/1so0lckI6N6_niVEYaBu-LIcpOdZf99kj?usp=sharing>`_ or `Baidu Wangpan <https://pan.baidu.com/s/1p51sWMgVFbAaHQmL4aD_-g>`_ (Password: e272).
+2. Find the converting script from RecDatasets_, and transform them to atomic files.
+
+If the user use other datasets, she should format the data according to the format of the atomic files.
+
+.. _RecDatasets: https://github.com/RUCAIBox/RecDatasets
+
+For the dataset of ml-1m, the converting file is:
+
+**ml-1m.inter**
+
+=============   =============   ============   ===============
+user_id:token   item_id:token   rating:float   timestamp:float
+=============   =============   ============   ===============
+1               1193            5              978300760
+1               661             3              978302109
+=============   =============   ============   ===============
+
+**ml-1m.user**
+
+=============   =========   ============   ================   ==============
+user_id:token   age:token   gender:token   occupation:token   zip_code:token
+=============   =========   ============   ================   ==============
+1               1           F              10                 48067
+2               56          M              16                 70072
+=============   =========   ============   ================   ==============
+
+**ml-1m.item**
+
+=============   =====================   ==================   ============================
+item_id:token   movie_title:token_seq   release_year:token   genre:token_seq
+=============   =====================   ==================   ============================
+1               Toy Story               1995                 Animation Children's Comedy
+2               Jumanji                 1995                 Adventure Children's Fantasy
+=============   =====================   ==================   ============================
+
+
+Local Path
+---------------
+Name of atomic files, name of dir that containing atomic files and ``config['dataset']`` should be the same.
+
+``config['data_path']`` should be the parent dir of the dir that containing atomic files.
+
+For example:
+
+.. code:: none
+
+    ~/xxx/yyy/ml-1m/
+    ├── ml-1m.inter
+    ├── ml-1m.item
+    ├── ml-1m.kg
+    ├── ml-1m.link
+    └── ml-1m.user
+
+.. code:: yaml
+
+    data_path: ~/xxx/yyy/
+    dataset: ml-1m
+
+Convert to Dataset
+---------------------
+Here, we present how to convert atomic files into :class:`~recbole.data.dataset.dataset.Dataset`.
+
+Suppose we use ml-1m to train BPR.
+
+According to the dataset information, the user should set the dataset information and filtering parameters in the configuration file `ml-1m.yaml`.
+For example, we conduct 10-core filtering, removing the ratings which are smaller than 3, the time of the record should be earlier than 97830000, and we only load inter data.
+
+.. code:: yaml
+
+    USER_ID_FIELD: user_id
+    ITEM_ID_FIELD: item_id
+    RATING_FIELD: rating
+    TIME_FIELD: timestamp
+
+    load_col:
+        inter: [user_id, item_id, rating, timestamp]
+
+    min_user_inter_num: 10
+    min_item_inter_num: 10
+    lowest_val:
+        rating: 3
+        timestamp: 97830000
+
+
+.. code:: python
+
+    from recbole.config import Config
+    from recbole.data import create_dataset, data_preparation
+
+    if __name__ == '__main__':
+        config = Config(model='BPR', dataset='ml-1m', config_file_list=['ml-1m.yaml'])
+        dataset = create_dataset(config)
+
+
+Convert to Dataloader
+------------------------
+Here, we present how to convert :class:`~recbole.data.dataset.dataset.Dataset` into :obj:`Dataloader`.
+
+We firstly set the parameters in the configuration file `ml-1m.yaml`.
+We leverage random ordering + ratio-based splitting and full ranking with all item candidates, the splitting ratio is set as 8:1:1.
+
+.. code:: yaml
+
+    ...
+
+    eval_setting: RO_RS,full
+    split_ratio: [0.8,0.1,0.1]
+
+
+.. code:: python
+
+    from recbole.config import Config
+    from recbole.data import create_dataset, data_preparation
+
+
+    if __name__ == '__main__':
+
+        ...
+
+        train_data, valid_data, test_data = data_preparation(config, dataset)
diff --git a/docs/source/user_guide/usage/use_modules.rst b/docs/source/user_guide/usage/use_modules.rst
new file mode 100644
index 000000000..6b20556a1
--- /dev/null
+++ b/docs/source/user_guide/usage/use_modules.rst
@@ -0,0 +1,203 @@
+Use Modules
+================
+You can recall different modules in RecBole to satisfy your requirement.
+
+The complete process is as follows:
+
+.. code:: python
+
+    from logging import getLogger
+    from recbole.config import Config
+    from recbole.data import create_dataset, data_preparation
+    from recbole.model.general_recommender import BPR
+    from recbole.trainer import Trainer
+    from recbole.utils import init_seed, init_logger
+
+    if __name__ == '__main__':
+
+        # configurations initialization
+        config = Config(model='BPR', dataset='ml-100k')
+
+        # init random seed
+        init_seed(config['seed'], config['reproducibility'])
+
+        # logger initialization
+        init_logger(config)
+        logger = getLogger()
+
+        # write config info into log
+        logger.info(config)
+
+        # dataset creating and filtering
+        dataset = create_dataset(config)
+        logger.info(dataset)
+
+        # dataset splitting
+        train_data, valid_data, test_data = data_preparation(config, dataset)
+
+        # model loading and initialization
+        model = BPR(config, train_data).to(config['device'])
+        logger.info(model)
+
+        # trainer loading and initialization
+        trainer = Trainer(config, model)
+
+        # model training
+        best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
+
+        # model evaluation
+        test_result = trainer.evaluate(test_data)
+        print(test_result)
+
+
+Configurations Initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    config = Config(model='BPR', dataset='ml-100k')
+
+:class:`~recbole.config.configurator.Config` module is used to set parameters and experiment setup. 　
+Please refer to :doc:`../config_settings` for more details.
+
+
+Init Random Seed
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    init_seed(config['seed'], config['reproducibility'])
+
+Initializing the random seed to ensure the reproducibility of the experiments.
+
+
+Dataset Filtering
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    dataset = create_dataset(config)
+
+Filtering the data files according to the parameters indicated in the configuration.
+
+
+Dataset Splitting
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    train_data, valid_data, test_data = data_preparation(config, dataset)
+
+Splitting the dataset according to the parameters indicated in the configuration.
+
+
+Model Initialization
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    model = BPR(config, train_data).to(config['device'])
+
+Initializing the model according to the model names, and initializing the instance of the model.
+
+
+Trainer Initialization
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+　　
+    trainer = Trainer(config, model)
+
+Initializing the trainer, which is used to model training and evaluation.
+
+
+Automatic Selection of Model and Trainer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In the above example, we manually import the model class :class:`~recbole.model.general_recommender.bpr.BPR` and the trainer class :class:`~recbole.trainer.trainer.Trainer`.
+For the implemented model, we support the automatic acquisition of the corresponding model class and
+trainer class through the model name.
+
+
+.. code:: python
+
+    from recbole.utils import get_model, get_trainer
+
+    if __name__ == '__main__':
+
+        ...
+
+        # model loading and initialization
+        model = get_model(config['model'])(config, train_data).to(config['device'])
+
+        # trainer loading and initialization
+        trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
+
+        ...
+
+
+Model Training
+^^^^^^^^^^^^^^^^^^^
+
+.. code:: python
+
+    best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
+
+Inputting the training and valid data, and beginning the training process.
+
+
+Model Evaluation
+^^^^^^^^^^^^^^^^^^^^^^^
+.. code:: python
+
+    test_result = trainer.evaluate(test_data)
+
+Inputting the test data, and evaluating based on the trained model.
+
+
+Resume Model From Break Point
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Our toolkit also supports reloading the parameters from previously trained models.
+
+In this example, we present how to train the model from the former parameters.
+
+.. code:: python
+
+    ...
+
+    if __name__ == '__main__':
+
+        ...
+
+        # trainer loading and initialization
+        trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
+
+        # resume from break point
+        checkpoint_file = 'checkpoint.pth'
+        trainer.resume_checkpoint(checkpoint_file)
+
+        # model training
+        best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
+
+        ...
+
+:attr:`checkpoint_file` is the file used to store the model.
+
+
+In this example, we present how to test a model based on the previous saved parameters.
+
+.. code:: python
+
+    ...
+
+    if __name__ == '__main__':
+
+        ...
+
+        # trainer loading and initialization
+        trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
+
+        # model evaluation
+        checkpoint_file = 'checkpoint.pth'
+        test_result = trainer.evaluate(test_data, model_file=checkpoint_file)
+        print(test_result)
+        ...
\ No newline at end of file
diff --git a/recbole/__init__.py b/recbole/__init__.py
index 83c915e76..5e6584fac 100644
--- a/recbole/__init__.py
+++ b/recbole/__init__.py
@@ -2,4 +2,4 @@
 from __future__ import print_function
 from __future__ import division
 
-__version__ = '0.1.2'
+__version__ = '0.2.0'
\ No newline at end of file
diff --git a/recbole/config/configurator.py b/recbole/config/configurator.py
index 04e51adfa..0df0d4d42 100644
--- a/recbole/config/configurator.py
+++ b/recbole/config/configurator.py
@@ -3,9 +3,9 @@
 # @Email  : linzihan.super@foxmail.com
 
 # UPDATE
-# @Time   : 2020/10/04, 2020/10/9
-# @Author : Shanlei Mu, Yupeng Hou
-# @Email  : slmu@ruc.edu.cn, houyupeng@ruc.edu.cn
+# @Time   : 2020/10/04, 2020/10/9, 2021/2/17
+# @Author : Shanlei Mu, Yupeng Hou, Jiawei Guan
+# @Email  : slmu@ruc.edu.cn, houyupeng@ruc.edu.cn, Guanjw@ruc.edu.cn
 
 """
 recbole.config.configurator
@@ -89,14 +89,16 @@ def _build_yaml_loader(self):
         loader = yaml.FullLoader
         loader.add_implicit_resolver(
             u'tag:yaml.org,2002:float',
-            re.compile(u'''^(?:
+            re.compile(
+                u'''^(?:
              [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
             |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
             |\\.[0-9_]+(?:[eE][-+][0-9]+)?
             |[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
             |[-+]?\\.(?:inf|Inf|INF)
-            |\\.(?:nan|NaN|NAN))$''', re.X),
-            list(u'-+0123456789.'))
+            |\\.(?:nan|NaN|NAN))$''', re.X
+            ), list(u'-+0123456789.')
+        )
         return loader
 
     def _convert_config_dict(self, config_dict):
@@ -175,7 +177,8 @@ def _get_model_and_dataset(self, model, dataset):
             except KeyError:
                 raise KeyError(
                     'model need to be specified in at least one of the these ways: '
-                    '[model variable, config file, config dict, command line] ')
+                    '[model variable, config file, config dict, command line] '
+                )
         if not isinstance(model, str):
             final_model_class = model
             final_model = model.__name__
@@ -187,8 +190,10 @@ def _get_model_and_dataset(self, model, dataset):
             try:
                 final_dataset = self.external_config_dict['dataset']
             except KeyError:
-                raise KeyError('dataset need to be specified in at least one of the these ways: '
-                               '[dataset variable, config file, config dict, command line] ')
+                raise KeyError(
+                    'dataset need to be specified in at least one of the these ways: '
+                    '[dataset variable, config file, config dict, command line] '
+                )
         else:
             final_dataset = dataset
 
@@ -223,13 +228,14 @@ def _load_internal_config_dict(self, model, model_class, dataset):
             if os.path.isfile(file):
                 config_dict = self._update_internal_config_dict(file)
                 if file == dataset_init_file:
-                    self.parameters['Dataset'] += [key for key in config_dict.keys() if
-                                                   key not in self.parameters['Dataset']]
+                    self.parameters['Dataset'] += [
+                        key for key in config_dict.keys() if key not in self.parameters['Dataset']
+                    ]
 
         self.internal_config_dict['MODEL_TYPE'] = model_class.type
         if self.internal_config_dict['MODEL_TYPE'] == ModelType.GENERAL:
             pass
-        elif self.internal_config_dict['MODEL_TYPE'] in {ModelType.CONTEXT, ModelType.XGBOOST}:
+        elif self.internal_config_dict['MODEL_TYPE'] in {ModelType.CONTEXT, ModelType.DECISIONTREE}:
             self._update_internal_config_dict(context_aware_init)
             if dataset == 'ml-100k':
                 self._update_internal_config_dict(context_aware_on_ml_100k_init)
@@ -272,8 +278,7 @@ def _set_default_parameters(self):
             elif self.final_config_dict['loss_type'] in ['BPR']:
                 self.final_config_dict['MODEL_INPUT_TYPE'] = InputType.PAIRWISE
         else:
-            raise ValueError('Either Model has attr \'input_type\','
-                             'or arg \'loss_type\' should exist in config.')
+            raise ValueError('Either Model has attr \'input_type\',' 'or arg \'loss_type\' should exist in config.')
 
         eval_type = None
         for metric in self.final_config_dict['metrics']:
@@ -324,11 +329,18 @@ def __str__(self):
         args_info = ''
         for category in self.parameters:
             args_info += category + ' Hyper Parameters: \n'
-            args_info += '\n'.join(
-                ["{}={}".format(arg, value)
-                 for arg, value in self.final_config_dict.items()
-                 if arg in self.parameters[category]])
+            args_info += '\n'.join([
+                "{}={}".format(arg, value) for arg, value in self.final_config_dict.items()
+                if arg in self.parameters[category]
+            ])
             args_info += '\n\n'
+            
+        args_info += 'Other Hyper Parameters: \n'
+        args_info += '\n'.join([
+                "{}={}".format(arg, value) for arg, value in self.final_config_dict.items()
+                if arg not in sum(list(self.parameters.values()) + [['model', 'dataset', 'config_files']], [])
+            ])
+        args_info += '\n\n'
         return args_info
 
     def __repr__(self):
diff --git a/recbole/config/eval_setting.py b/recbole/config/eval_setting.py
index 030a1136f..30ed04f8e 100644
--- a/recbole/config/eval_setting.py
+++ b/recbole/config/eval_setting.py
@@ -7,7 +7,6 @@
 # @Author : Yupeng Hou, Yushuo Chen
 # @Email  : houyupeng@ruc.edu.cn, chenyushuo@ruc.edu.cn
 
-
 """
 recbole.config.eval_setting
 ################################
@@ -163,7 +162,7 @@ def random_ordering(self):
         """
         self.set_ordering('shuffle')
 
-    def sort_by(self, field, ascending=None):
+    def sort_by(self, field, ascending=True):
         """Setting about Sorting.
 
         Similar with pandas' sort_values_
@@ -175,12 +174,6 @@ def sort_by(self, field, ascending=None):
             ascending (bool or list of bool): Sort ascending vs. descending. Specify list for multiple sort orders.
                 If this is a list of bool, must match the length of the field
         """
-        if not isinstance(field, list):
-            field = [field]
-        if ascending is None:
-            ascending = [True] * len(field)
-            if len(ascending) == 1:
-                ascending = True
         self.set_ordering('by', field=field, ascending=ascending)
 
     def temporal_ordering(self):
@@ -210,7 +203,7 @@ def set_splitting(self, strategy='none', **kwargs):
         legal_strategy = {'none', 'by_ratio', 'by_value', 'loo'}
         if strategy not in legal_strategy:
             raise ValueError('Split Strategy [{}] should in {}'.format(strategy, list(legal_strategy)))
-        if strategy == 'loo' and self.group_by is None:
+        if strategy == 'loo' and self.group_field is None:
             raise ValueError('Leave-One-Out request group firstly')
         self.split_args = {'strategy': strategy}
         self.split_args.update(kwargs)
@@ -258,7 +251,7 @@ def set_neg_sampling(self, strategy='none', distribution='uniform', **kwargs):
             distribution (str): distribution of sampler, either 'uniform' or 'popularity'.
 
         Example:
-            >>> es.neg_sample_to(100)
+            >>> es.full()
             >>> es.neg_sample_by(1)
         """
         legal_strategy = {'none', 'full', 'by'}
@@ -278,6 +271,40 @@ def neg_sample_by(self, by, distribution='uniform'):
         """
         self.set_neg_sampling(strategy='by', by=by, distribution=distribution)
 
+    def set_ordering_and_splitting(self, es_str):
+        """Setting about ordering and split method.
+
+        Args:
+            es_str (str): Ordering and splitting method string. Either ``RO_RS``, ``RO_LS``, ``TO_RS`` or ``TO_LS``.
+        """
+        args = es_str.split('_')
+        if len(args) != 2:
+            raise ValueError(f'`{es_str}` is invalid eval_setting.')
+        ordering_args, split_args = args
+
+        if self.config['group_by_user']:
+            self.group_by_user()
+
+        if ordering_args == 'RO':
+            self.random_ordering()
+        elif ordering_args == 'TO':
+            self.temporal_ordering()
+        else:
+            raise NotImplementedError(f'Ordering args `{ordering_args}` is not implemented.')
+
+        if split_args == 'RS':
+            ratios = self.config['split_ratio']
+            if ratios is None:
+                raise ValueError('`ratios` should be set if `RS` is set.')
+            self.split_by_ratio(ratios)
+        elif split_args == 'LS':
+            leave_one_num = self.config['leave_one_num']
+            if leave_one_num is None:
+                raise ValueError('`leave_one_num` should be set if `LS` is set.')
+            self.leave_one_out(leave_one_num=leave_one_num)
+        else:
+            raise NotImplementedError(f'Split args `{split_args}` is not implemented.')
+
     def RO_RS(self, ratios=(0.8, 0.1, 0.1), group_by_user=True):
         """Preset about Random Ordering and Ratio-based Splitting.
 
diff --git a/recbole/data/__init__.py b/recbole/data/__init__.py
index 76b29b2e0..4b790aba7 100644
--- a/recbole/data/__init__.py
+++ b/recbole/data/__init__.py
@@ -1,4 +1,3 @@
 from recbole.data.utils import *
 
-
 __all__ = ['create_dataset', 'data_preparation']
diff --git a/recbole/data/dataloader/__init__.py b/recbole/data/dataloader/__init__.py
index 90ffa311f..b8b2e911f 100644
--- a/recbole/data/dataloader/__init__.py
+++ b/recbole/data/dataloader/__init__.py
@@ -4,5 +4,5 @@
 from recbole.data.dataloader.context_dataloader import *
 from recbole.data.dataloader.sequential_dataloader import *
 from recbole.data.dataloader.knowledge_dataloader import *
-from recbole.data.dataloader.xgboost_dataloader import *
+from recbole.data.dataloader.decisiontree_dataloader import *
 from recbole.data.dataloader.user_dataloader import *
diff --git a/recbole/data/dataloader/abstract_dataloader.py b/recbole/data/dataloader/abstract_dataloader.py
index e5464606d..73e642472 100644
--- a/recbole/data/dataloader/abstract_dataloader.py
+++ b/recbole/data/dataloader/abstract_dataloader.py
@@ -42,8 +42,7 @@ class AbstractDataLoader(object):
     """
     dl_type = None
 
-    def __init__(self, config, dataset,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(self, config, dataset, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
         self.config = config
         self.logger = getLogger()
         self.dataset = dataset
@@ -56,13 +55,6 @@ def __init__(self, config, dataset,
         if self.real_time is None:
             self.real_time = True
 
-        self.join = self.dataset.join
-        self.history_item_matrix = self.dataset.history_item_matrix
-        self.history_user_matrix = self.dataset.history_user_matrix
-        self.inter_matrix = self.dataset.inter_matrix
-        self.get_user_feature = self.dataset.get_user_feature
-        self.get_item_feature = self.dataset.get_item_feature
-
         for dataset_attr in self.dataset._dataloader_apis:
             try:
                 flag = hasattr(self.dataset, dataset_attr)
@@ -129,7 +121,7 @@ def set_batch_size(self, batch_size):
             raise PermissionError('Cannot change dataloader\'s batch_size while iteration')
         if self.batch_size != batch_size:
             self.batch_size = batch_size
-            self.logger.warning('Batch size is changed to {}'.format(batch_size))
+            self.logger.warning(f'Batch size is changed to {batch_size}.')
 
     def upgrade_batch_size(self, batch_size):
         """Upgrade the batch_size of the dataloader, if input batch_size is bigger than current batch_size.
diff --git a/recbole/data/dataloader/xgboost_dataloader.py b/recbole/data/dataloader/decisiontree_dataloader.py
similarity index 66%
rename from recbole/data/dataloader/xgboost_dataloader.py
rename to recbole/data/dataloader/decisiontree_dataloader.py
index 17c9ab2b0..996b720a8 100644
--- a/recbole/data/dataloader/xgboost_dataloader.py
+++ b/recbole/data/dataloader/decisiontree_dataloader.py
@@ -8,7 +8,7 @@
 # @Email  : 254170321@qq.com
 
 """
-recbole.data.dataloader.xgboost_dataloader
+recbole.data.dataloader.decisiontree_dataloader
 ################################################
 """
 
@@ -16,24 +16,24 @@
     GeneralFullDataLoader
 
 
-class XgboostDataLoader(GeneralDataLoader):
-    """:class:`XgboostDataLoader` is inherit from
+class DecisionTreeDataLoader(GeneralDataLoader):
+    """:class:`DecisionTreeDataLoader` is inherit from
     :class:`~recbole.data.dataloader.general_dataloader.GeneralDataLoader`,
     and didn't add/change anything at all.
     """
     pass
 
 
-class XgboostNegSampleDataLoader(GeneralNegSampleDataLoader):
-    """:class:`XgboostNegSampleDataLoader` is inherit from
+class DecisionTreeNegSampleDataLoader(GeneralNegSampleDataLoader):
+    """:class:`DecisionTreeNegSampleDataLoader` is inherit from
     :class:`~recbole.data.dataloader.general_dataloader.GeneralNegSampleDataLoader`,
     and didn't add/change anything at all.
     """
     pass
 
 
-class XgboostFullDataLoader(GeneralFullDataLoader):
-    """:class:`XgboostFullDataLoader` is inherit from
+class DecisionTreeFullDataLoader(GeneralFullDataLoader):
+    """:class:`DecisionTreeFullDataLoader` is inherit from
     :class:`~recbole.data.dataloader.general_dataloader.GeneralFullDataLoader`,
     and didn't add/change anything at all.
     """
diff --git a/recbole/data/dataloader/general_dataloader.py b/recbole/data/dataloader/general_dataloader.py
index baea05fe5..818571586 100644
--- a/recbole/data/dataloader/general_dataloader.py
+++ b/recbole/data/dataloader/general_dataloader.py
@@ -34,10 +34,8 @@ class GeneralDataLoader(AbstractDataLoader):
     """
     dl_type = DataLoaderType.ORIGIN
 
-    def __init__(self, config, dataset,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
-        super().__init__(config, dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+    def __init__(self, config, dataset, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+        super().__init__(config, dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     @property
     def pr_end(self):
@@ -47,7 +45,7 @@ def _shuffle(self):
         self.dataset.shuffle()
 
     def _next_batch_data(self):
-        cur_data = self.dataset[self.pr: self.pr + self.step]
+        cur_data = self.dataset[self.pr:self.pr + self.step]
         self.pr += self.step
         return cur_data
 
@@ -70,14 +68,16 @@ class GeneralNegSampleDataLoader(NegSampleByMixin, AbstractDataLoader):
         shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.
     """
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
         self.uid_field = dataset.uid_field
         self.iid_field = dataset.iid_field
         self.uid_list, self.uid2index, self.uid2items_num = None, None, None
 
-        super().__init__(config, dataset, sampler, neg_sample_args,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(
+            config, dataset, sampler, neg_sample_args, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle
+        )
 
     def setup(self):
         if self.user_inter_in_one_batch:
@@ -107,7 +107,7 @@ def _batch_size_adaptation(self):
             for i in range(1, len(inters_num)):
                 if new_batch_size + inters_num[i] > self.batch_size:
                     break
-                batch_num = i
+                batch_num = i + 1
                 new_batch_size += inters_num[i]
             self.step = batch_num
             self.upgrade_batch_size(new_batch_size)
@@ -132,7 +132,7 @@ def _shuffle(self):
 
     def _next_batch_data(self):
         if self.user_inter_in_one_batch:
-            uid_list = self.uid_list[self.pr: self.pr + self.step]
+            uid_list = self.uid_list[self.pr:self.pr + self.step]
             data_list = []
             for uid in uid_list:
                 index = self.uid2index[uid]
@@ -144,7 +144,7 @@ def _next_batch_data(self):
             self.pr += self.step
             return cur_data
         else:
-            cur_data = self._neg_sampling(self.dataset[self.pr: self.pr + self.step])
+            cur_data = self._neg_sampling(self.dataset[self.pr:self.pr + self.step])
             self.pr += self.step
             return cur_data
 
@@ -167,7 +167,7 @@ def _neg_sample_by_point_wise_sampling(self, inter_feat, neg_iids):
         new_data[self.iid_field][pos_inter_num:] = neg_iids
         new_data = self.dataset.join(new_data)
         labels = torch.zeros(pos_inter_num * self.times)
-        labels[: pos_inter_num] = 1.0
+        labels[:pos_inter_num] = 1.0
         new_data.update(Interaction({self.label_field: labels}))
         return new_data
 
@@ -203,8 +203,9 @@ class GeneralFullDataLoader(NegSampleMixin, AbstractDataLoader):
     """
     dl_type = DataLoaderType.FULL
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
         if neg_sample_args['strategy'] != 'full':
             raise ValueError('neg_sample strategy in GeneralFullDataLoader() should be `full`')
 
@@ -229,11 +230,12 @@ def __init__(self, config, dataset, sampler, neg_sample_args,
                 positive_item = set()
             positive_item.add(iid)
         self._set_user_property(last_uid, uid2used_item[last_uid], positive_item)
-        self.uid_list = torch.tensor(self.uid_list)
+        self.uid_list = torch.tensor(self.uid_list, dtype=torch.int64)
         self.user_df = dataset.join(Interaction({uid_field: self.uid_list}))
 
-        super().__init__(config, dataset, sampler, neg_sample_args,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(
+            config, dataset, sampler, neg_sample_args, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle
+        )
 
     def _set_user_property(self, uid, used_item, positive_item):
         if uid is None:
@@ -260,7 +262,7 @@ def _shuffle(self):
         self.logger.warnning('GeneralFullDataLoader can\'t shuffle')
 
     def _next_batch_data(self):
-        user_df = self.user_df[self.pr: self.pr + self.step]
+        user_df = self.user_df[self.pr:self.pr + self.step]
         cur_data = self._neg_sampling(user_df)
         self.pr += self.step
         return cur_data
diff --git a/recbole/data/dataloader/knowledge_dataloader.py b/recbole/data/dataloader/knowledge_dataloader.py
index 632b283bf..6b6bb00ac 100644
--- a/recbole/data/dataloader/knowledge_dataloader.py
+++ b/recbole/data/dataloader/knowledge_dataloader.py
@@ -35,8 +35,7 @@ class KGDataLoader(AbstractDataLoader):
             However, in :class:`KGDataLoader`, it's guaranteed to be ``True``.
     """
 
-    def __init__(self, config, dataset, sampler,
-                 batch_size=1, dl_format=InputType.PAIRWISE, shuffle=False):
+    def __init__(self, config, dataset, sampler, batch_size=1, dl_format=InputType.PAIRWISE, shuffle=False):
         self.sampler = sampler
         self.neg_sample_num = 1
 
@@ -48,8 +47,7 @@ def __init__(self, config, dataset, sampler,
         self.neg_tid_field = self.neg_prefix + self.tid_field
         dataset.copy_field_property(self.neg_tid_field, self.tid_field)
 
-        super().__init__(config, dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(config, dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     def setup(self):
         """Make sure that the :attr:`shuffle` is True. If :attr:`shuffle` is False, it will be changed to True
@@ -67,7 +65,7 @@ def _shuffle(self):
         self.dataset.kg_feat.shuffle()
 
     def _next_batch_data(self):
-        cur_data = self._neg_sampling(self.dataset.kg_feat[self.pr: self.pr + self.step])
+        cur_data = self._neg_sampling(self.dataset.kg_feat[self.pr:self.pr + self.step])
         self.pr += self.step
         return cur_data
 
@@ -112,28 +110,44 @@ class KnowledgeBasedDataLoader(AbstractDataLoader):
             and user-item interaction information.
     """
 
-    def __init__(self, config, dataset, sampler, kg_sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(
+        self,
+        config,
+        dataset,
+        sampler,
+        kg_sampler,
+        neg_sample_args,
+        batch_size=1,
+        dl_format=InputType.POINTWISE,
+        shuffle=False
+    ):
 
         # using sampler
-        self.general_dataloader = GeneralNegSampleDataLoader(config=config, dataset=dataset,
-                                                             sampler=sampler, neg_sample_args=neg_sample_args,
-                                                             batch_size=batch_size, dl_format=dl_format,
-                                                             shuffle=shuffle)
+        self.general_dataloader = GeneralNegSampleDataLoader(
+            config=config,
+            dataset=dataset,
+            sampler=sampler,
+            neg_sample_args=neg_sample_args,
+            batch_size=batch_size,
+            dl_format=dl_format,
+            shuffle=shuffle
+        )
 
         # using kg_sampler and dl_format is pairwise
-        self.kg_dataloader = KGDataLoader(config, dataset, kg_sampler,
-                                          batch_size=batch_size, dl_format=InputType.PAIRWISE, shuffle=True)
+        self.kg_dataloader = KGDataLoader(
+            config, dataset, kg_sampler, batch_size=batch_size, dl_format=InputType.PAIRWISE, shuffle=True
+        )
 
         self.state = None
 
-        super().__init__(config, dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(config, dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     def __iter__(self):
         if self.state is None:
-            raise ValueError('The dataloader\'s state must be set when using the kg based dataloader, '
-                             'you should call set_mode() before __iter__()')
+            raise ValueError(
+                'The dataloader\'s state must be set when using the kg based dataloader, '
+                'you should call set_mode() before __iter__()'
+            )
         if self.state == KGDataLoaderState.KG:
             return self.kg_dataloader.__iter__()
         elif self.state == KGDataLoaderState.RS:
@@ -154,11 +168,17 @@ def __next__(self):
         return self._next_batch_data()
 
     def __len__(self):
-        return len(self.general_dataloader)
+        if self.state == KGDataLoaderState.KG:
+            return len(self.kg_dataloader)
+        else:
+            return len(self.general_dataloader)
 
     @property
     def pr_end(self):
-        return self.general_dataloader.pr_end
+        if self.state == KGDataLoaderState.KG:
+            return self.kg_dataloader.pr_end
+        else:
+            return self.general_dataloader.pr_end
 
     def _next_batch_data(self):
         try:
@@ -182,5 +202,5 @@ def set_mode(self, state):
             state (KGDataLoaderState): the state of :class:`KnowledgeBasedDataLoader`.
         """
         if state not in set(KGDataLoaderState):
-            raise NotImplementedError('kg data loader has no state named [{}]'.format(self.state))
+            raise NotImplementedError(f'Kg data loader has no state named [{self.state}].')
         self.state = state
diff --git a/recbole/data/dataloader/neg_sample_mixin.py b/recbole/data/dataloader/neg_sample_mixin.py
index 9ea6d05ed..e21d614ac 100644
--- a/recbole/data/dataloader/neg_sample_mixin.py
+++ b/recbole/data/dataloader/neg_sample_mixin.py
@@ -33,16 +33,16 @@ class NegSampleMixin(AbstractDataLoader):
     """
     dl_type = DataLoaderType.NEGSAMPLE
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
         if neg_sample_args['strategy'] not in ['by', 'full']:
-            raise ValueError('neg_sample strategy [{}] has not been implemented'.format(neg_sample_args['strategy']))
+            raise ValueError(f"Neg_sample strategy [{neg_sample_args['strategy']}] has not been implemented.")
 
         self.sampler = sampler
         self.neg_sample_args = neg_sample_args
 
-        super().__init__(config, dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(config, dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     def setup(self):
         """Do batch size adaptation.
@@ -95,8 +95,9 @@ class NegSampleByMixin(NegSampleMixin):
         shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.
     """
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
         if neg_sample_args['strategy'] != 'by':
             raise ValueError('neg_sample strategy in GeneralInteractionBasedDataLoader() should be `by`')
 
@@ -122,10 +123,11 @@ def __init__(self, config, dataset, sampler, neg_sample_args,
                 neg_item_feat_col = self.neg_prefix + item_feat_col
                 dataset.copy_field_property(neg_item_feat_col, item_feat_col)
         else:
-            raise ValueError('`neg sampling by` with dl_format [{}] not been implemented'.format(dl_format))
+            raise ValueError(f'`neg sampling by` with dl_format [{dl_format}] not been implemented.')
 
-        super().__init__(config, dataset, sampler, neg_sample_args,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(
+            config, dataset, sampler, neg_sample_args, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle
+        )
 
     def _neg_sample_by_pair_wise_sampling(self, *args):
         """Pair-wise sampling.
diff --git a/recbole/data/dataloader/sequential_dataloader.py b/recbole/data/dataloader/sequential_dataloader.py
index 88c643b65..51d3a845d 100644
--- a/recbole/data/dataloader/sequential_dataloader.py
+++ b/recbole/data/dataloader/sequential_dataloader.py
@@ -43,8 +43,7 @@ class SequentialDataLoader(AbstractDataLoader):
     """
     dl_type = DataLoaderType.ORIGIN
 
-    def __init__(self, config, dataset,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(self, config, dataset, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
         self.uid_field = dataset.uid_field
         self.iid_field = dataset.iid_field
         self.time_field = dataset.time_field
@@ -72,12 +71,13 @@ def __init__(self, config, dataset,
         self.item_list_length_field = config['ITEM_LIST_LENGTH_FIELD']
         dataset.set_field_property(self.item_list_length_field, FeatureType.TOKEN, FeatureSource.INTERACTION, 1)
 
-        self.uid_list, self.item_list_index, self.target_index, self.item_list_length = \
-            dataset.prepare_data_augmentation()
+        self.uid_list = dataset.uid_list
+        self.item_list_index = dataset.item_list_index
+        self.target_index = dataset.target_index
+        self.item_list_length = dataset.item_list_length
         self.pre_processed_data = None
 
-        super().__init__(config, dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(config, dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     def data_preprocess(self):
         """Do data augmentation before training/evaluation.
@@ -105,9 +105,9 @@ def _next_batch_data(self):
 
     def _get_processed_data(self, index):
         if self.real_time:
-            cur_data = self.augmentation(self.item_list_index[index],
-                                         self.target_index[index],
-                                         self.item_list_length[index])
+            cur_data = self.augmentation(
+                self.item_list_index[index], self.target_index[index], self.item_list_length[index]
+            )
         else:
             cur_data = self.pre_processed_data[index]
         return cur_data
@@ -164,10 +164,12 @@ class SequentialNegSampleDataLoader(NegSampleByMixin, SequentialDataLoader):
         shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.
     """
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
-        super().__init__(config, dataset, sampler, neg_sample_args,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
+        super().__init__(
+            config, dataset, sampler, neg_sample_args, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle
+        )
 
     def _batch_size_adaptation(self):
         batch_num = max(self.batch_size // self.times, 1)
@@ -192,9 +194,9 @@ def _neg_sampling(self, data):
             data_len = len(data[self.uid_field])
             data_list = []
             for i in range(data_len):
-                uids = data[self.uid_field][i: i + 1]
+                uids = data[self.uid_field][i:i + 1]
                 neg_iids = self.sampler.sample_by_user_ids(uids, self.neg_sample_by)
-                cur_data = data[i: i + 1]
+                cur_data = data[i:i + 1]
                 data_list.append(self.sampling_func(cur_data, neg_iids))
             return cat_interactions(data_list)
         else:
@@ -212,7 +214,7 @@ def _neg_sample_by_point_wise_sampling(self, data, neg_iids):
         new_data = data.repeat(self.times)
         new_data[self.iid_field][pos_inter_num:] = neg_iids
         labels = torch.zeros(pos_inter_num * self.times)
-        labels[: pos_inter_num] = 1.0
+        labels[:pos_inter_num] = 1.0
         new_data.update(Interaction({self.label_field: labels}))
         return new_data
 
@@ -248,10 +250,12 @@ class SequentialFullDataLoader(NegSampleMixin, SequentialDataLoader):
     """
     dl_type = DataLoaderType.FULL
 
-    def __init__(self, config, dataset, sampler, neg_sample_args,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
-        super().__init__(config, dataset, sampler, neg_sample_args,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+    def __init__(
+        self, config, dataset, sampler, neg_sample_args, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False
+    ):
+        super().__init__(
+            config, dataset, sampler, neg_sample_args, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle
+        )
 
     def _batch_size_adaptation(self):
         pass
diff --git a/recbole/data/dataloader/user_dataloader.py b/recbole/data/dataloader/user_dataloader.py
index e96849835..2d2fd62a0 100644
--- a/recbole/data/dataloader/user_dataloader.py
+++ b/recbole/data/dataloader/user_dataloader.py
@@ -35,13 +35,11 @@ class UserDataLoader(AbstractDataLoader):
     """
     dl_type = DataLoaderType.ORIGIN
 
-    def __init__(self, config, dataset,
-                 batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
+    def __init__(self, config, dataset, batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
         self.uid_field = dataset.uid_field
         self.user_list = Interaction({self.uid_field: torch.arange(dataset.user_num)})
 
-        super().__init__(config=config, dataset=dataset,
-                         batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
+        super().__init__(config=config, dataset=dataset, batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
 
     def setup(self):
         """Make sure that the :attr:`shuffle` is True. If :attr:`shuffle` is False, it will be changed to True
@@ -59,6 +57,6 @@ def _shuffle(self):
         self.user_list.shuffle()
 
     def _next_batch_data(self):
-        cur_data = self.user_list[self.pr: self.pr + self.step]
+        cur_data = self.user_list[self.pr:self.pr + self.step]
         self.pr += self.step
         return cur_data
diff --git a/recbole/data/dataset/__init__.py b/recbole/data/dataset/__init__.py
index 58026c332..739e86e46 100644
--- a/recbole/data/dataset/__init__.py
+++ b/recbole/data/dataset/__init__.py
@@ -3,5 +3,5 @@
 from recbole.data.dataset.kg_dataset import KnowledgeBasedDataset
 from recbole.data.dataset.social_dataset import SocialDataset
 from recbole.data.dataset.kg_seq_dataset import Kg_Seq_Dataset
-from recbole.data.dataset.xgboost_dataset import XgboostDataset
+from recbole.data.dataset.decisiontree_dataset import DecisionTreeDataset
 from recbole.data.dataset.customized_dataset import *
diff --git a/recbole/data/dataset/customized_dataset.py b/recbole/data/dataset/customized_dataset.py
index 2d7a6d7c7..676e35cdb 100644
--- a/recbole/data/dataset/customized_dataset.py
+++ b/recbole/data/dataset/customized_dataset.py
@@ -15,10 +15,12 @@
 
 
 class GRU4RecKGDataset(Kg_Seq_Dataset):
+
     def __init__(self, config, saved_dataset=None):
         super().__init__(config, saved_dataset=saved_dataset)
 
 
 class KSRDataset(Kg_Seq_Dataset):
+
     def __init__(self, config, saved_dataset=None):
         super().__init__(config, saved_dataset=saved_dataset)
diff --git a/recbole/data/dataset/dataset.py b/recbole/data/dataset/dataset.py
index 38c343fc4..22c5c8b60 100644
--- a/recbole/data/dataset/dataset.py
+++ b/recbole/data/dataset/dataset.py
@@ -105,7 +105,7 @@ def _from_scratch(self):
         """Load dataset from scratch.
         Initialize attributes firstly, then load data from atomic files, pre-process the dataset lastly.
         """
-        self.logger.debug('Loading {} from scratch'.format(self.__class__))
+        self.logger.debug(f'Loading {self.__class__} from scratch')
 
         self._get_preset()
         self._get_field_from_config()
@@ -135,17 +135,17 @@ def _get_field_from_config(self):
         self.time_field = self.config['TIME_FIELD']
 
         if (self.uid_field is None) ^ (self.iid_field is None):
-            raise ValueError('USER_ID_FIELD and ITEM_ID_FIELD need to be set at the same time '
-                             'or not set at the same time.')
+            raise ValueError(
+                'USER_ID_FIELD and ITEM_ID_FIELD need to be set at the same time or not set at the same time.'
+            )
 
-        self.logger.debug('uid_field: {}'.format(self.uid_field))
-        self.logger.debug('iid_field: {}'.format(self.iid_field))
+        self.logger.debug(f'uid_field: {self.uid_field}')
+        self.logger.debug(f'iid_field: {self.iid_field}')
 
     def _data_processing(self):
         """Data preprocessing, including:
 
-        - K-core data filtering
-        - Value-based data filtering
+        - Data filtering
         - Remap ID
         - Missing value imputation
         - Normalization
@@ -166,16 +166,19 @@ def _data_filtering(self):
         """Data filtering
 
         - Filter missing user_id or item_id
+        - Remove duplicated user-item interaction
         - Value-based data filtering
+        - Remove interaction by user or item
         - K-core data filtering
 
         Note:
             After filtering, feats(``DataFrame``) has non-continuous index,
-            thus :meth:`~recbole.data.dataset.dataset.Dataset._reset_index()` will reset the index of feats.
+            thus :meth:`~recbole.data.dataset.dataset.Dataset._reset_index` will reset the index of feats.
         """
         self._filter_nan_user_or_item()
         self._remove_duplication()
         self._filter_by_field_value()
+        self._filter_inter_by_user_or_item()
         self._filter_by_inter_num()
         self._reset_index()
 
@@ -190,12 +193,14 @@ def _build_feat_name_list(self):
         Note:
             Subclasses can inherit this method to add new feat.
         """
-        feat_name_list = [feat_name for feat_name in ['inter_feat', 'user_feat', 'item_feat']
-                          if getattr(self, feat_name, None) is not None]
+        feat_name_list = [
+            feat_name for feat_name in ['inter_feat', 'user_feat', 'item_feat']
+            if getattr(self, feat_name, None) is not None
+        ]
         if self.config['additional_feat_suffix'] is not None:
             for suf in self.config['additional_feat_suffix']:
-                if getattr(self, '{}_feat'.format(suf), None) is not None:
-                    feat_name_list.append('{}_feat'.format(suf))
+                if getattr(self, f'{suf}_feat', None) is not None:
+                    feat_name_list.append(f'{suf}_feat')
         return feat_name_list
 
     def _restore_saved_dataset(self, saved_dataset):
@@ -204,10 +209,10 @@ def _restore_saved_dataset(self, saved_dataset):
         Args:
             saved_dataset (str): path for the saved dataset.
         """
-        self.logger.debug('Restoring dataset from [{}]'.format(saved_dataset))
+        self.logger.debug(f'Restoring dataset from [{saved_dataset}].')
 
         if (saved_dataset is None) or (not os.path.isdir(saved_dataset)):
-            raise ValueError('filepath [{}] need to be a dir'.format(saved_dataset))
+            raise ValueError(f'Filepath [{saved_dataset}] need to be a dir.')
 
         with open(os.path.join(saved_dataset, 'basic-info.json')) as file:
             basic_info = json.load(file)
@@ -217,12 +222,12 @@ def _restore_saved_dataset(self, saved_dataset):
 
         feats = ['inter', 'user', 'item']
         for name in feats:
-            cur_file_name = os.path.join(saved_dataset, '{}.csv'.format(name))
+            cur_file_name = os.path.join(saved_dataset, f'{name}.csv')
             if os.path.isfile(cur_file_name):
                 df = pd.read_csv(cur_file_name)
-                setattr(self, '{}_feat'.format(name), df)
+                setattr(self, f'{name}_feat', df)
             else:
-                setattr(self, '{}_feat'.format(name), None)
+                setattr(self, f'{name}_feat', None)
 
         self._get_field_from_config()
 
@@ -255,24 +260,24 @@ def _load_inter_feat(self, token, dataset_path):
             dataset_path (str): path of dataset dir.
         """
         if self.benchmark_filename_list is None:
-            inter_feat_path = os.path.join(dataset_path, '{}.{}'.format(token, 'inter'))
+            inter_feat_path = os.path.join(dataset_path, f'{token}.inter')
             if not os.path.isfile(inter_feat_path):
-                raise ValueError('File {} not exist'.format(inter_feat_path))
+                raise ValueError(f'File {inter_feat_path} not exist.')
 
             inter_feat = self._load_feat(inter_feat_path, FeatureSource.INTERACTION)
-            self.logger.debug('interaction feature loaded successfully from [{}]'.format(inter_feat_path))
+            self.logger.debug(f'Interaction feature loaded successfully from [{inter_feat_path}].')
             self.inter_feat = inter_feat
         else:
             sub_inter_lens = []
             sub_inter_feats = []
             for filename in self.benchmark_filename_list:
-                file_path = os.path.join(dataset_path, '{}.{}.{}'.format(token, filename, 'inter'))
+                file_path = os.path.join(dataset_path, f'{token}.{filename}.inter')
                 if os.path.isfile(file_path):
                     temp = self._load_feat(file_path, FeatureSource.INTERACTION)
                     sub_inter_feats.append(temp)
                     sub_inter_lens.append(len(temp))
                 else:
-                    raise ValueError('File {} not exist'.format(file_path))
+                    raise ValueError(f'File {file_path} not exist.')
             inter_feat = pd.concat(sub_inter_feats)
             self.inter_feat, self.file_size_list = inter_feat, sub_inter_lens
 
@@ -292,19 +297,19 @@ def _load_user_or_item_feat(self, token, dataset_path, source, field_name):
             ``user_id`` and ``item_id`` has source :obj:`~recbole.utils.enum_type.FeatureSource.USER_ID` and
             :obj:`~recbole.utils.enum_type.FeatureSource.ITEM_ID`
         """
-        feat_path = os.path.join(dataset_path, '{}.{}'.format(token, source.value))
+        feat_path = os.path.join(dataset_path, f'{token}.{source.value}')
         if os.path.isfile(feat_path):
             feat = self._load_feat(feat_path, source)
-            self.logger.debug('[{}] feature loaded successfully from [{}]'.format(source.value, feat_path))
+            self.logger.debug(f'[{source.value}] feature loaded successfully from [{feat_path}].')
         else:
             feat = None
-            self.logger.debug('[{}] not found, [{}] features are not loaded'.format(feat_path, source.value))
+            self.logger.debug(f'[{feat_path}] not found, [{source.value}] features are not loaded.')
 
         field = getattr(self, field_name, None)
         if feat is not None and field is None:
-            raise ValueError('{} must be exist if {}_feat exist'.format(field_name, source.value))
+            raise ValueError(f'{field_name} must be exist if {source.value}_feat exist.')
         if feat is not None and field not in feat:
-            raise ValueError('{} must be loaded if {}_feat is loaded'.format(field_name, source.value))
+            raise ValueError(f'{field_name} must be loaded if {source.value}_feat is loaded.')
 
         if field in self.field2source:
             self.field2source[field] = FeatureSource(source.value + '_id')
@@ -324,14 +329,14 @@ def _load_additional_feat(self, token, dataset_path):
         if self.config['additional_feat_suffix'] is None:
             return
         for suf in self.config['additional_feat_suffix']:
-            if hasattr(self, '{}_feat'.format(suf)):
-                raise ValueError('{}_feat already exist'.format(suf))
-            feat_path = os.path.join(dataset_path, '{}.{}'.format(token, suf))
+            if hasattr(self, f'{suf}_feat'):
+                raise ValueError(f'{suf}_feat already exist.')
+            feat_path = os.path.join(dataset_path, f'{token}.{suf}')
             if os.path.isfile(feat_path):
                 feat = self._load_feat(feat_path, suf)
             else:
-                raise ValueError('Additional feature file [{}] not found'.format(feat_path))
-            setattr(self, '{}_feat'.format(suf), feat)
+                raise ValueError(f'Additional feature file [{feat_path}] not found.')
+            setattr(self, f'{suf}_feat', feat)
 
     def _get_load_and_unload_col(self, source):
         """Parsing ``config['load_col']`` and ``config['unload_col']`` according to source.
@@ -360,10 +365,11 @@ def _get_load_and_unload_col(self, source):
             unload_col = None
 
         if load_col and unload_col:
-            raise ValueError('load_col [{}] and unload_col [{}] can not be set the same time'.format(
-                load_col, unload_col))
+            raise ValueError(f'load_col [{load_col}] and unload_col [{unload_col}] can not be set the same time.')
 
-        self.logger.debug('\n [{}]:\n\t load_col: [{}]\n\t unload_col: [{}]\n'.format(source, load_col, unload_col))
+        self.logger.debug(f'[{source}]: ')
+        self.logger.debug(f'\t load_col: [{load_col}]')
+        self.logger.debug(f'\t unload_col: [{unload_col}]')
         return load_col, unload_col
 
     def _load_feat(self, filepath, source):
@@ -383,7 +389,7 @@ def _load_feat(self, filepath, source):
             Their length is limited only after calling :meth:`~_dict_to_interaction` or
             :meth:`~_dataframe_to_interaction`
         """
-        self.logger.debug('loading feature from [{}] (source: [{}])'.format(filepath, source))
+        self.logger.debug(f'Loading feature from [{filepath}] (source: [{source}]).')
 
         load_col, unload_col = self._get_load_and_unload_col(source)
         if load_col == set():
@@ -400,7 +406,7 @@ def _load_feat(self, filepath, source):
             try:
                 ftype = FeatureType(ftype)
             except ValueError:
-                raise ValueError('Type {} from field {} is not supported'.format(ftype, field))
+                raise ValueError(f'Type {ftype} from field {field} is not supported.')
             if load_col is not None and field not in load_col:
                 continue
             if unload_col is not None and field in unload_col:
@@ -415,7 +421,7 @@ def _load_feat(self, filepath, source):
             dtype[field_type] = np.float64 if ftype == FeatureType.FLOAT else str
 
         if len(columns) == 0:
-            self.logger.warning('no columns has been loaded from [{}]'.format(source))
+            self.logger.warning(f'No columns has been loaded from [{source}]')
             return None
 
         df = pd.read_csv(filepath, delimiter=self.config['field_separator'], usecols=usecols, dtype=dtype)
@@ -426,7 +432,7 @@ def _load_feat(self, filepath, source):
             ftype = self.field2type[field]
             if not ftype.value.endswith('seq'):
                 continue
-            df[field].fillna(value='0', inplace=True)
+            df[field].fillna(value='', inplace=True)
             if ftype == FeatureType.TOKEN_SEQ:
                 df[field] = [list(filter(None, _.split(seq_separator))) for _ in df[field].values]
             elif ftype == FeatureType.FLOAT_SEQ:
@@ -455,28 +461,30 @@ def _preload_weight_matrix(self):
         if preload_fields is None:
             return
 
-        self.logger.debug('preload weight matrix for {}'.format(preload_fields))
+        self.logger.debug(f'Preload weight matrix for {preload_fields}.')
 
         for preload_id_field in preload_fields:
             preload_value_field = preload_fields[preload_id_field]
             if preload_id_field not in self.field2source:
-                raise ValueError('preload id field [{}] not exist'.format(preload_id_field))
+                raise ValueError(f'Preload id field [{preload_id_field}] not exist.')
             if preload_value_field not in self.field2source:
-                raise ValueError('preload value field [{}] not exist'.format(preload_value_field))
+                raise ValueError(f'Preload value field [{preload_value_field}] not exist.')
             pid_source = self.field2source[preload_id_field]
             pv_source = self.field2source[preload_value_field]
             if pid_source != pv_source:
-                raise ValueError(f'preload id field [{preload_id_field}] is from source [{pid_source}],'
-                                 f'while preload value field [{preload_value_field}] is from source [{pv_source}], '
-                                 f'which should be the same.')
+                raise ValueError(
+                    f'Preload id field [{preload_id_field}] is from source [{pid_source}],'
+                    f'while preload value field [{preload_value_field}] is from source [{pv_source}], '
+                    f'which should be the same.'
+                )
             for feat_name in self.feat_name_list:
                 feat = getattr(self, feat_name)
                 if preload_id_field in feat:
                     id_ftype = self.field2type[preload_id_field]
                     if id_ftype != FeatureType.TOKEN:
-                        raise ValueError('preload id field [{}] should be type token, but is [{}]'.format(
-                            preload_id_field, id_ftype
-                        ))
+                        raise ValueError(
+                            f'Preload id field [{preload_id_field}] should be type token, but is [{id_ftype}].'
+                        )
                     value_ftype = self.field2type[preload_value_field]
                     token_num = self.num(preload_id_field)
                     if value_ftype == FeatureType.FLOAT:
@@ -497,9 +505,10 @@ def _preload_weight_matrix(self):
                             else:
                                 matrix[pid] = prow[:max_len]
                     else:
-                        self.logger.warning('Field [{}] with type [{}] is not \'float\' or \'float_seq\', \
-                                             which will not be handled by preload matrix.'.format(preload_value_field,
-                                                                                                  value_ftype))
+                        self.logger.warning(
+                            f'Field [{preload_value_field}] with type [{value_ftype}] is not `float` or `float_seq`, '
+                            f'which will not be handled by preload matrix.'
+                        )
                         continue
                     self._preloaded_weight[preload_id_field] = matrix
 
@@ -511,8 +520,6 @@ def _fill_nan(self):
 
         For fields with type :obj:`~recbole.utils.enum_type.FeatureType.FLOAT`, missing value will be filled by
         the average of original data.
-
-        For sequence features, missing value will be filled by ``[0]``.
         """
         self.logger.debug('Filling nan')
 
@@ -524,10 +531,8 @@ def _fill_nan(self):
                     feat[field].fillna(value=0, inplace=True)
                 elif ftype == FeatureType.FLOAT:
                     feat[field].fillna(value=feat[field].mean(), inplace=True)
-                elif ftype.value.endswith('seq'):
-                    feat[field] = feat[field].apply(lambda x: [0]
-                                                    if (not isinstance(x, np.ndarray) and (not isinstance(x, list)))
-                                                    else x)
+                else:
+                    feat[field] = feat[field].apply(lambda x: [] if isinstance(x, float) else x)
 
     def _normalize(self):
         """Normalization if ``config['normalize_field']`` or ``config['normalize_all']`` is set.
@@ -539,23 +544,23 @@ def _normalize(self):
         Note:
             Only float-like fields can be normalized.
         """
-        if self.config['normalize_field'] is not None and self.config['normalize_all'] is not None:
-            raise ValueError('normalize_field and normalize_all can\'t be set at the same time')
+        if self.config['normalize_field'] is not None and self.config['normalize_all'] is True:
+            raise ValueError('Normalize_field and normalize_all can\'t be set at the same time.')
 
         if self.config['normalize_field']:
             fields = self.config['normalize_field']
             for field in fields:
                 ftype = self.field2type[field]
                 if field not in self.field2type:
-                    raise ValueError('Field [{}] does not exist'.format(field))
+                    raise ValueError(f'Field [{field}] does not exist.')
                 elif ftype != FeatureType.FLOAT and ftype != FeatureType.FLOAT_SEQ:
-                    self.logger.warning('{} is not a FLOAT/FLOAT_SEQ feat, which will not be normalized.'.format(field))
+                    self.logger.warning(f'{field} is not a FLOAT/FLOAT_SEQ feat, which will not be normalized.')
         elif self.config['normalize_all']:
             fields = self.float_like_fields
         else:
             return
 
-        self.logger.debug('Normalized fields: {}'.format(fields))
+        self.logger.debug(f'Normalized fields: {fields}')
 
         for feat_name in self.feat_name_list:
             feat = getattr(self, feat_name)
@@ -567,7 +572,7 @@ def _normalize(self):
                     lst = feat[field].values
                     mx, mn = max(lst), min(lst)
                     if mx == mn:
-                        self.logger.warning('All the same value in [{}] from [{}_feat]'.format(field, feat))
+                        self.logger.warning(f'All the same value in [{field}] from [{feat}_feat].')
                         feat[field] = 1.0
                     else:
                         feat[field] = (lst - mn) / (mx - mn)
@@ -576,7 +581,7 @@ def _normalize(self):
                     lst = feat[field].agg(np.concatenate)
                     mx, mn = max(lst), min(lst)
                     if mx == mn:
-                        self.logger.warning('All the same value in [{}] from [{}_feat]'.format(field, feat))
+                        self.logger.warning(f'All the same value in [{field}] from [{feat}_feat].')
                         lst = 1.0
                     else:
                         lst = (lst - mn) / (mx - mn)
@@ -590,15 +595,17 @@ def _filter_nan_user_or_item(self):
             feat = getattr(self, name + '_feat')
             if feat is not None:
                 dropped_feat = feat.index[feat[field].isnull()]
-                if dropped_feat.any():
-                    self.logger.warning('In {}_feat, line {}, {} do not exist, so they will be removed'.format(
-                        name, list(dropped_feat + 2), field))
+                if len(dropped_feat):
+                    self.logger.warning(
+                        f'In {name}_feat, line {list(dropped_feat + 2)}, {field} do not exist, so they will be removed.'
+                    )
                     feat.drop(feat.index[dropped_feat], inplace=True)
             if field is not None:
                 dropped_inter = self.inter_feat.index[self.inter_feat[field].isnull()]
-                if dropped_inter.any():
-                    self.logger.warning('In inter_feat, line {}, {} do not exist, so they will be removed'.format(
-                        name, list(dropped_inter + 2), field))
+                if len(dropped_inter):
+                    self.logger.warning(
+                        f'In inter_feat, line {list(dropped_inter + 2)}, {field} do not exist, so they will be removed.'
+                    )
                     self.inter_feat.drop(self.inter_feat.index[dropped_inter], inplace=True)
 
     def _remove_duplication(self):
@@ -617,11 +624,14 @@ def _remove_duplication(self):
 
         if self.time_field in self.inter_feat:
             self.inter_feat.sort_values(by=[self.time_field], ascending=True, inplace=True)
-            self.logger.info('Records in original dataset have been sorted by value of [{}] in ascending order.'.format(
-                self.time_field))
+            self.logger.info(
+                f'Records in original dataset have been sorted by value of [{self.time_field}] in ascending order.'
+            )
         else:
-            self.logger.warning('Timestamp field has not been loaded or specified, '
-                                'thus strategy [{}] of duplication removal may be meaningless.'.format(keep))
+            self.logger.warning(
+                f'Timestamp field has not been loaded or specified, '
+                f'thus strategy [{keep}] of duplication removal may be meaningless.'
+            )
         self.inter_feat.drop_duplicates(subset=[self.uid_field, self.iid_field], keep=keep, inplace=True)
 
     def _filter_by_inter_num(self):
@@ -653,12 +663,20 @@ def _filter_by_inter_num(self):
             item_inter_num = Counter(self.inter_feat[self.iid_field].values)
 
         while True:
-            ban_users = self._get_illegal_ids_by_inter_num(field=self.uid_field, feat=self.user_feat,
-                                                           inter_num=user_inter_num,
-                                                           max_num=max_user_inter_num, min_num=min_user_inter_num)
-            ban_items = self._get_illegal_ids_by_inter_num(field=self.iid_field, feat=self.item_feat,
-                                                           inter_num=item_inter_num,
-                                                           max_num=max_item_inter_num, min_num=min_item_inter_num)
+            ban_users = self._get_illegal_ids_by_inter_num(
+                field=self.uid_field,
+                feat=self.user_feat,
+                inter_num=user_inter_num,
+                max_num=max_user_inter_num,
+                min_num=min_user_inter_num
+            )
+            ban_items = self._get_illegal_ids_by_inter_num(
+                field=self.iid_field,
+                feat=self.item_feat,
+                inter_num=item_inter_num,
+                max_num=max_item_inter_num,
+                min_num=min_item_inter_num
+            )
 
             if len(ban_users) == 0 and len(ban_items) == 0:
                 break
@@ -681,7 +699,7 @@ def _filter_by_inter_num(self):
             item_inter_num -= Counter(item_inter[dropped_inter].values)
 
             dropped_index = self.inter_feat.index[dropped_inter]
-            self.logger.debug('[{}] dropped interactions'.format(len(dropped_index)))
+            self.logger.debug(f'[{len(dropped_index)}] dropped interactions.')
             self.inter_feat.drop(dropped_index, inplace=True)
 
     def _get_illegal_ids_by_inter_num(self, field, feat, inter_num, max_num=None, min_num=None):
@@ -697,9 +715,7 @@ def _get_illegal_ids_by_inter_num(self, field, feat, inter_num, max_num=None, mi
         Returns:
             set: illegal ids, whose inter num out of [min_num, max_num]
         """
-        self.logger.debug('\n get_illegal_ids_by_inter_num:\n\t field=[{}], max_num=[{}], min_num=[{}]'.format(
-            field, max_num, min_num
-        ))
+        self.logger.debug(f'get_illegal_ids_by_inter_num: field=[{field}], max_num=[{max_num}], min_num=[{min_num}]')
 
         max_num = max_num or np.inf
         min_num = min_num or -1
@@ -710,7 +726,7 @@ def _get_illegal_ids_by_inter_num(self, field, feat, inter_num, max_num=None, mi
             for id_ in feat[field].values:
                 if inter_num[id_] < min_num:
                     ids.add(id_)
-        self.logger.debug('[{}] illegal_ids_by_inter_num, field=[{}]'.format(len(ids), field))
+        self.logger.debug(f'[{len(ids)}] illegal_ids_by_inter_num, field=[{field}]')
         return ids
 
     def _filter_by_field_value(self):
@@ -722,9 +738,6 @@ def _filter_by_field_value(self):
         filter_field += self._drop_by_value(self.config['equal_val'], lambda x, y: x != y)
         filter_field += self._drop_by_value(self.config['not_equal_val'], lambda x, y: x == y)
 
-        if not filter_field:
-            return
-
     def _reset_index(self):
         """Reset index for all feats in :attr:`feat_name_list`.
         """
@@ -747,13 +760,13 @@ def _drop_by_value(self, val, cmp):
         if val is None:
             return []
 
-        self.logger.debug('drop_by_value: val={}'.format(val))
+        self.logger.debug(f'drop_by_value: val={val}')
         filter_field = []
         for field in val:
             if field not in self.field2type:
-                raise ValueError('field [{}] not defined in dataset'.format(field))
+                raise ValueError(f'Field [{field}] not defined in dataset.')
             if self.field2type[field] not in {FeatureType.FLOAT, FeatureType.FLOAT_SEQ}:
-                raise ValueError('field [{}] is not float-like field in dataset, which can\'t be filter'.format(field))
+                raise ValueError(f'Field [{field}] is not float-like field in dataset, which can\'t be filter.')
             for feat_name in self.feat_name_list:
                 feat = getattr(self, feat_name)
                 if field in feat:
@@ -768,7 +781,7 @@ def _del_col(self, feat, field):
             feat (pandas.DataFrame or Interaction): the feat contains field.
             field (str): field name to be dropped.
         """
-        self.logger.debug('delete column [{}]'.format(field))
+        self.logger.debug(f'Delete column [{field}].')
         if isinstance(feat, Interaction):
             feat.drop(column=field)
         else:
@@ -777,6 +790,24 @@ def _del_col(self, feat, field):
             if field in dct:
                 del dct[field]
 
+    def _filter_inter_by_user_or_item(self):
+        """Remove interaction in inter_feat which user or item is not in user_feat or item_feat.
+        """
+        if self.config['filter_inter_by_user_or_item'] is not True:
+            return
+
+        remained_inter = pd.Series(True, index=self.inter_feat.index)
+
+        if self.user_feat is not None:
+            remained_uids = self.user_feat[self.uid_field].values
+            remained_inter &= self.inter_feat[self.uid_field].isin(remained_uids)
+
+        if self.item_feat is not None:
+            remained_iids = self.item_feat[self.iid_field].values
+            remained_inter &= self.inter_feat[self.iid_field].isin(remained_iids)
+
+        self.inter_feat.drop(self.inter_feat.index[~remained_inter], inplace=True)
+
     def _set_label_by_threshold(self):
         """Generate 0/1 labels according to value of features.
 
@@ -792,17 +823,17 @@ def _set_label_by_threshold(self):
         if threshold is None:
             return
 
-        self.logger.debug('set label by {}'.format(threshold))
+        self.logger.debug(f'Set label by {threshold}.')
 
         if len(threshold) != 1:
-            raise ValueError('threshold length should be 1')
+            raise ValueError('Threshold length should be 1.')
 
         self.set_field_property(self.label_field, FeatureType.FLOAT, FeatureSource.INTERACTION, 1)
         for field, value in threshold.items():
             if field in self.inter_feat:
                 self.inter_feat[self.label_field] = (self.inter_feat[field] >= value).astype(int)
             else:
-                raise ValueError('field [{}] not in inter_feat'.format(field))
+                raise ValueError(f'Field [{field}] not in inter_feat.')
             self._del_col(self.inter_feat, field)
 
     def _get_fields_in_same_space(self):
@@ -827,14 +858,14 @@ def _get_fields_in_same_space(self):
             elif count == 1:
                 continue
             else:
-                raise ValueError('field [{}] occurred in `fields_in_same_space` more than one time'.format(field))
+                raise ValueError(f'Field [{field}] occurred in `fields_in_same_space` more than one time.')
 
         for field_set in fields_in_same_space:
             if self.uid_field in field_set and self.iid_field in field_set:
                 raise ValueError('uid_field and iid_field can\'t in the same ID space')
             for field in field_set:
                 if field not in token_like_fields:
-                    raise ValueError('field [{}] is not a token-like field'.format(field))
+                    raise ValueError(f'Field [{field}] is not a token-like field.')
 
         fields_in_same_space.extend(additional)
         return fields_in_same_space
@@ -868,7 +899,7 @@ def _get_remap_list(self, field_set):
             source = self.field2source[field]
             if isinstance(source, FeatureSource):
                 source = source.value
-            feat = getattr(self, '{}_feat'.format(source))
+            feat = getattr(self, f'{source}_feat')
             ftype = self.field2type[field]
             remap_list.append((feat, field, ftype))
         return remap_list
@@ -877,7 +908,7 @@ def _remap_ID_all(self):
         """Get ``config['fields_in_same_space']`` firstly, and remap each.
         """
         fields_in_same_space = self._get_fields_in_same_space()
-        self.logger.debug('fields_in_same_space: {}'.format(fields_in_same_space))
+        self.logger.debug(f'fields_in_same_space: {fields_in_same_space}')
         for field_set in fields_in_same_space:
             remap_list = self._get_remap_list(field_set)
             self._remap(remap_list)
@@ -944,7 +975,7 @@ def num(self, field):
             int: The number of different tokens (``1`` if ``field`` is a float-like field).
         """
         if field not in self.field2type:
-            raise ValueError('field [{}] not defined in dataset'.format(field))
+            raise ValueError(f'Field [{field}] not defined in dataset.')
         if self.field2type[field] not in {FeatureType.TOKEN, FeatureType.TOKEN_SEQ}:
             return self.field2seqlen[field]
         else:
@@ -1070,9 +1101,9 @@ def id2token(self, field, ids):
             return self.field2id_token[field][ids]
         except IndexError:
             if isinstance(ids, list):
-                raise ValueError('[{}] is not a one-dimensional list'.format(ids))
+                raise ValueError(f'[{ids}] is not a one-dimensional list.')
             else:
-                raise ValueError('[{}] is not a valid ids'.format(ids))
+                raise ValueError(f'[{ids}] is not a valid ids.')
 
     @property
     @dlapi.set()
@@ -1140,8 +1171,9 @@ def _check_field(self, *field_names):
         """
         for field_name in field_names:
             if getattr(self, field_name, None) is None:
-                raise ValueError('{} isn\'t set'.format(field_name))
+                raise ValueError(f'{field_name} isn\'t set.')
 
+    @dlapi.set()
     def join(self, df):
         """Given interaction feature, join user/item feature into it.
 
@@ -1170,15 +1202,17 @@ def __repr__(self):
     def __str__(self):
         info = [self.dataset_name]
         if self.uid_field:
-            info.extend(['The number of users: {}'.format(self.user_num),
-                         'Average actions of users: {}'.format(self.avg_actions_of_users)])
+            info.extend([
+                f'The number of users: {self.user_num}', f'Average actions of users: {self.avg_actions_of_users}'
+            ])
         if self.iid_field:
-            info.extend(['The number of items: {}'.format(self.item_num),
-                         'Average actions of items: {}'.format(self.avg_actions_of_items)])
-        info.append('The number of inters: {}'.format(self.inter_num))
+            info.extend([
+                f'The number of items: {self.item_num}', f'Average actions of items: {self.avg_actions_of_items}'
+            ])
+        info.append(f'The number of inters: {self.inter_num}')
         if self.uid_field and self.iid_field:
-            info.append('The sparsity of the dataset: {}%'.format(self.sparsity * 100))
-        info.append('Remain Fields: {}'.format(list(self.field2type)))
+            info.append(f'The sparsity of the dataset: {self.sparsity * 100}%')
+        info.append(f'Remain Fields: {list(self.field2type)}')
         return '\n'.join(info)
 
     def copy(self, new_inter_feat):
@@ -1206,8 +1240,9 @@ def _drop_unused_col(self):
             feat = getattr(self, feat_name + '_feat')
             for field in unused_fields:
                 if field not in feat:
-                    self.logger.warning('field [{}] is not in [{}_feat], which can not be set in `unused_col`'.format(
-                        field, feat_name))
+                    self.logger.warning(
+                        f'Field [{field}] is not in [{feat_name}_feat], which can not be set in `unused_col`.'
+                    )
                     continue
                 self._del_col(feat, field)
 
@@ -1251,7 +1286,7 @@ def split_by_ratio(self, ratios, group_by=None):
         Note:
             Other than the first one, each part is rounded down.
         """
-        self.logger.debug('split by ratios [{}], group_by=[{}]'.format(ratios, group_by))
+        self.logger.debug(f'split by ratios [{ratios}], group_by=[{group_by}]')
         tot_ratio = sum(ratios)
         ratios = [_ / tot_ratio for _ in ratios]
 
@@ -1266,7 +1301,7 @@ def split_by_ratio(self, ratios, group_by=None):
                 tot_cnt = len(grouped_index)
                 split_ids = self._calcu_split_ids(tot=tot_cnt, ratios=ratios)
                 for index, start, end in zip(next_index, [0] + split_ids, split_ids + [tot_cnt]):
-                    index.extend(grouped_index[start: end])
+                    index.extend(grouped_index[start:end])
 
         self._drop_unused_col()
         next_df = [self.inter_feat[index] for index in next_index]
@@ -1306,7 +1341,7 @@ def leave_one_out(self, group_by, leave_one_num=1):
         Returns:
             list: List of :class:`~Dataset`, whose interaction features has been split.
         """
-        self.logger.debug('leave one out, group_by=[{}], leave_one_num=[{}]'.format(group_by, leave_one_num))
+        self.logger.debug(f'leave one out, group_by=[{group_by}], leave_one_num=[{leave_one_num}]')
         if group_by is None:
             raise ValueError('leave one out strategy require a group field')
 
@@ -1346,7 +1381,7 @@ def build(self, eval_setting):
         """
         if self.benchmark_filename_list is not None:
             cumsum = list(np.cumsum(self.file_size_list))
-            datasets = [self.copy(self.inter_feat[start: end]) for start, end in zip([0] + cumsum[:-1], cumsum)]
+            datasets = [self.copy(self.inter_feat[start:end]) for start, end in zip([0] + cumsum[:-1], cumsum)]
             return datasets
 
         ordering_args = eval_setting.ordering_args
@@ -1376,9 +1411,9 @@ def save(self, filepath):
             filepath (str): path of saved dir.
         """
         if (filepath is None) or (not os.path.isdir(filepath)):
-            raise ValueError('filepath [{}] need to be a dir'.format(filepath))
+            raise ValueError(f'Filepath [{filepath}] need to be a dir.')
 
-        self.logger.debug('Saving into [{}]'.format(filepath))
+        self.logger.debug(f'Saving into [{filepath}]')
         basic_info = {
             'field2type': self.field2type,
             'field2source': self.field2source,
@@ -1391,10 +1426,11 @@ def save(self, filepath):
 
         feats = ['inter', 'user', 'item']
         for name in feats:
-            df = getattr(self, '{}_feat'.format(name))
+            df = getattr(self, f'{name}_feat')
             if df is not None:
-                df.to_csv(os.path.join(filepath, '{}.csv'.format(name)))
+                df.to_csv(os.path.join(filepath, f'{name}.csv'))
 
+    @dlapi.set()
     def get_user_feature(self):
         """
         Returns:
@@ -1406,6 +1442,7 @@ def get_user_feature(self):
         else:
             return self.user_feat
 
+    @dlapi.set()
     def get_item_feature(self):
         """
         Returns:
@@ -1444,7 +1481,7 @@ def _create_sparse_matrix(self, df_feat, source_field, target_field, form='coo',
             data = np.ones(len(df_feat))
         else:
             if value_field not in df_feat:
-                raise ValueError('value_field [{}] should be one of `df_feat`\'s features.'.format(value_field))
+                raise ValueError(f'Value_field [{value_field}] should be one of `df_feat`\'s features.')
             data = df_feat[value_field]
         mat = coo_matrix((data, (src, tgt)), shape=(self.num(source_field), self.num(target_field)))
 
@@ -1453,7 +1490,7 @@ def _create_sparse_matrix(self, df_feat, source_field, target_field, form='coo',
         elif form == 'csr':
             return mat.tocsr()
         else:
-            raise NotImplementedError('sparse matrix format [{}] has not been implemented.'.format(form))
+            raise NotImplementedError(f'Sparse matrix format [{form}] has not been implemented.')
 
     def _create_graph(self, tensor_feat, source_field, target_field, form='dgl', value_field=None):
         """Get graph that describe relations between two fields.
@@ -1500,8 +1537,9 @@ def _create_graph(self, tensor_feat, source_field, target_field, form='dgl', val
             graph = Data(edge_index=torch.stack([src, tgt]), edge_attr=edge_attr)
             return graph
         else:
-            raise NotImplementedError('graph format [{}] has not been implemented.'.format(form))
+            raise NotImplementedError(f'Graph format [{form}] has not been implemented.')
 
+    @dlapi.set()
     def inter_matrix(self, form='coo', value_field=None):
         """Get sparse matrix that describe interactions between user_id and item_id.
 
@@ -1552,7 +1590,7 @@ def _history_matrix(self, row, value_field=None):
             values = np.ones(len(self.inter_feat))
         else:
             if value_field not in self.inter_feat:
-                raise ValueError('value_field [{}] should be one of `inter_feat`\'s features.'.format(value_field))
+                raise ValueError(f'Value_field [{value_field}] should be one of `inter_feat`\'s features.')
             values = self.inter_feat[value_field].numpy()
 
         if row == 'user':
@@ -1568,9 +1606,10 @@ def _history_matrix(self, row, value_field=None):
 
         col_num = np.max(history_len)
         if col_num > max_col_num * 0.2:
-            self.logger.warning('max value of {}\'s history interaction records has reached {}% of the total.'.format(
-                row, col_num / max_col_num * 100,
-            ))
+            self.logger.warning(
+                f'Max value of {row}\'s history interaction records has reached '
+                f'{col_num / max_col_num * 100}% of the total.'
+            )
 
         history_matrix = np.zeros((row_num, col_num), dtype=np.int64)
         history_value = np.zeros((row_num, col_num))
@@ -1582,6 +1621,7 @@ def _history_matrix(self, row, value_field=None):
 
         return torch.LongTensor(history_matrix), torch.FloatTensor(history_value), torch.LongTensor(history_len)
 
+    @dlapi.set()
     def history_item_matrix(self, value_field=None):
         """Get dense matrix describe user's history interaction records.
 
@@ -1606,6 +1646,7 @@ def history_item_matrix(self, value_field=None):
         """
         return self._history_matrix(row='user', value_field=value_field)
 
+    @dlapi.set()
     def history_user_matrix(self, value_field=None):
         """Get dense matrix describe item's history interaction records.
 
@@ -1643,7 +1684,7 @@ def get_preload_weight(self, field):
             numpy.ndarray: preloaded weight matrix. See :doc:`../user_guide/data/data_args` for details.
         """
         if field not in self._preloaded_weight:
-            raise ValueError('field [{}] not in preload_weight'.format(field))
+            raise ValueError(f'Field [{field}] not in preload_weight')
         return self._preloaded_weight[field]
 
     def _dataframe_to_interaction(self, data):
diff --git a/recbole/data/dataset/xgboost_dataset.py b/recbole/data/dataset/decisiontree_dataset.py
similarity index 92%
rename from recbole/data/dataset/xgboost_dataset.py
rename to recbole/data/dataset/decisiontree_dataset.py
index eb044b8e4..1bb394d9e 100644
--- a/recbole/data/dataset/xgboost_dataset.py
+++ b/recbole/data/dataset/decisiontree_dataset.py
@@ -3,7 +3,7 @@
 # @Email  : 254170321@qq.com
 
 """
-recbole.data.xgboost_dataset
+recbole.data.decisiontree_dataset
 ##########################
 """
 
@@ -11,8 +11,8 @@
 from recbole.utils import FeatureType
 
 
-class XgboostDataset(Dataset):
-    """:class:`XgboostDataset` is based on :class:`~recbole.data.dataset.dataset.Dataset`,
+class DecisionTreeDataset(Dataset):
+    """:class:`DecisionTreeDataset` is based on :class:`~recbole.data.dataset.dataset.Dataset`,
     and 
 
     Attributes:
@@ -81,7 +81,7 @@ def _from_scratch(self):
         """Load dataset from scratch.
         Initialize attributes firstly, then load data from atomic files, pre-process the dataset lastly.
         """
-        self.logger.debug('Loading {} from scratch'.format(self.__class__))
+        self.logger.debug(f'Loading {self.__class__} from scratch.')
 
         self._get_preset()
         self._get_field_from_config()
diff --git a/recbole/data/dataset/kg_dataset.py b/recbole/data/dataset/kg_dataset.py
index ea27412ed..11b8f20f6 100644
--- a/recbole/data/dataset/kg_dataset.py
+++ b/recbole/data/dataset/kg_dataset.py
@@ -80,8 +80,8 @@ def _get_field_from_config(self):
         self._check_field('head_entity_field', 'tail_entity_field', 'relation_field', 'entity_field')
         self.set_field_property(self.entity_field, FeatureType.TOKEN, FeatureSource.KG, 1)
 
-        self.logger.debug('relation_field: {}'.format(self.relation_field))
-        self.logger.debug('entity_field: {}'.format(self.entity_field))
+        self.logger.debug(f'relation_field: {self.relation_field}')
+        self.logger.debug(f'entity_field: {self.entity_field}')
 
     def _data_processing(self):
         self._set_field2ent_level()
@@ -116,11 +116,13 @@ def _load_data(self, token, dataset_path):
         self.item2entity, self.entity2item = self._load_link(self.dataset_name, self.dataset_path)
 
     def __str__(self):
-        info = [super().__str__(),
-                'The number of entities: {}'.format(self.entity_num),
-                'The number of relations: {}'.format(self.relation_num),
-                'The number of triples: {}'.format(len(self.kg_feat)),
-                'The number of items that have been linked to KG: {}'.format(len(self.item2entity))]
+        info = [
+            super().__str__(),
+            f'The number of entities: {self.entity_num}',
+            f'The number of relations: {self.relation_num}',
+            f'The number of triples: {len(self.kg_feat)}',
+            f'The number of items that have been linked to KG: {len(self.item2entity)}'
+        ]  # yapf: disable
         return '\n'.join(info)
 
     def _build_feat_name_list(self):
@@ -136,10 +138,10 @@ def save(self, filepath):
         raise NotImplementedError()
 
     def _load_kg(self, token, dataset_path):
-        self.logger.debug('loading kg from [{}]'.format(dataset_path))
-        kg_path = os.path.join(dataset_path, '{}.{}'.format(token, 'kg'))
+        self.logger.debug(f'Loading kg from [{dataset_path}].')
+        kg_path = os.path.join(dataset_path, f'{token}.kg')
         if not os.path.isfile(kg_path):
-            raise ValueError('[{}.{}] not found in [{}]'.format(token, 'kg', dataset_path))
+            raise ValueError(f'[{token}.kg] not found in [{dataset_path}].')
         df = self._load_feat(kg_path, FeatureSource.KG)
         self._check_kg(df)
         return df
@@ -151,10 +153,10 @@ def _check_kg(self, kg):
         assert self.relation_field in kg, kg_warn_message.format(self.relation_field)
 
     def _load_link(self, token, dataset_path):
-        self.logger.debug('loading link from [{}]'.format(dataset_path))
-        link_path = os.path.join(dataset_path, '{}.{}'.format(token, 'link'))
+        self.logger.debug(f'Loading link from [{dataset_path}].')
+        link_path = os.path.join(dataset_path, f'{token}.link')
         if not os.path.isfile(link_path):
-            raise ValueError('[{}.{}] not found in [{}]'.format(token, 'link', dataset_path))
+            raise ValueError(f'[{token}.link] not found in [{dataset_path}].')
         df = self._load_feat(link_path, 'link')
         self._check_link(df)
 
@@ -179,9 +181,7 @@ def _get_fields_in_same_space(self):
             - ``head_entity_id`` and ``target_entity_id`` should be remapped with ``item_id``.
         """
         fields_in_same_space = super()._get_fields_in_same_space()
-        fields_in_same_space = [
-            _ for _ in fields_in_same_space if not self._contain_ent_field(_)
-        ]
+        fields_in_same_space = [_ for _ in fields_in_same_space if not self._contain_ent_field(_)]
         ent_fields = self._get_ent_fields_in_same_space()
         for field_set in fields_in_same_space:
             if self.iid_field in field_set:
@@ -207,7 +207,7 @@ def _get_ent_fields_in_same_space(self):
             if self._contain_ent_field(field_set):
                 field_set = self._remove_ent_field(field_set)
                 ent_fields.update(field_set)
-        self.logger.debug('ent_fields: {}'.format(fields_in_same_space))
+        self.logger.debug(f'ent_fields: {fields_in_same_space}')
         return ent_fields
 
     def _remove_ent_field(self, field_set):
@@ -268,7 +268,7 @@ def _remap_entities_by_link(self):
             source = self.field2source[ent_field]
             if not isinstance(source, str):
                 source = source.value
-            feat = getattr(self, '{}_feat'.format(source))
+            feat = getattr(self, f'{source}_feat')
             entity_list = feat[ent_field].values
             for i, entity_id in enumerate(entity_list):
                 if entity_id in self.entity2item:
@@ -309,7 +309,7 @@ def _reset_ent_remapID(self, field, new_id_token):
             if self.item_feat is not None:
                 feats.append(self.item_feat)
         else:
-            feats = [getattr(self, '{}_feat'.format(source))]
+            feats = [getattr(self, f'{source}_feat')]
         for feat in feats:
             old_idx = feat[field].values
             new_idx = np.array([idmap[_] for _ in old_idx])
@@ -473,7 +473,7 @@ def _create_ckg_sparse_matrix(self, form='coo', show_relation=False):
         elif form == 'csr':
             return mat.tocsr()
         else:
-            raise NotImplementedError('sparse matrix format [{}] has not been implemented.'.format(form))
+            raise NotImplementedError(f'Sparse matrix format [{form}] has not been implemented.')
 
     def _create_ckg_graph(self, form='dgl', show_relation=False):
         user_num = self.user_num
@@ -510,7 +510,7 @@ def _create_ckg_graph(self, form='dgl', show_relation=False):
             graph = Data(edge_index=torch.stack([src, tgt]), edge_attr=edge_attr)
             return graph
         else:
-            raise NotImplementedError('graph format [{}] has not been implemented.'.format(form))
+            raise NotImplementedError(f'Graph format [{form}] has not been implemented.')
 
     @dlapi.set()
     def ckg_graph(self, form='coo', value_field=None):
@@ -542,9 +542,7 @@ def ckg_graph(self, form='coo', value_field=None):
             https://github.com/rusty1s/pytorch_geometric
         """
         if value_field is not None and value_field != self.relation_field:
-            raise ValueError('value_field [{}] can only be [{}] in ckg_graph.'.format(
-                value_field, self.relation_field
-            ))
+            raise ValueError(f'Value_field [{value_field}] can only be [{self.relation_field}] in ckg_graph.')
         show_relation = value_field is not None
 
         if form in ['coo', 'csr']:
diff --git a/recbole/data/dataset/sequential_dataset.py b/recbole/data/dataset/sequential_dataset.py
index 96272d742..1720d2865 100644
--- a/recbole/data/dataset/sequential_dataset.py
+++ b/recbole/data/dataset/sequential_dataset.py
@@ -55,11 +55,6 @@ def prepare_data_augmentation(self):
 
         ``u1, <i1, i2, i3> | i4``
 
-        Returns:
-            Tuple of ``self.uid_list``, ``self.item_list_index``,
-            ``self.target_index``, ``self.item_list_length``.
-            See :class:`SequentialDataset`'s attributes for details.
-
         Note:
             Actually, we do not really generate these new item sequences.
             One user's item sequence is stored only once in memory.
@@ -67,8 +62,6 @@ def prepare_data_augmentation(self):
             which saves memory and accelerates a lot.
         """
         self.logger.debug('prepare_data_augmentation')
-        if hasattr(self, 'uid_list'):
-            return self.uid_list, self.item_list_index, self.target_index, self.item_list_length
 
         self._check_field('uid_field', 'time_field')
         max_item_list_len = self.config['MAX_ITEM_LIST_LENGTH']
@@ -91,13 +84,14 @@ def prepare_data_augmentation(self):
         self.uid_list = np.array(uid_list)
         self.item_list_index = np.array(item_list_index)
         self.target_index = np.array(target_index)
-        self.item_list_length = np.array(item_list_length)
-        return self.uid_list, self.item_list_index, self.target_index, self.item_list_length
+        self.item_list_length = np.array(item_list_length, dtype=np.int64)
 
     def leave_one_out(self, group_by, leave_one_num=1):
-        self.logger.debug('leave one out, group_by=[{}], leave_one_num=[{}]'.format(group_by, leave_one_num))
+        self.logger.debug(f'Leave one out, group_by=[{group_by}], leave_one_num=[{leave_one_num}].')
         if group_by is None:
-            raise ValueError('leave one out strategy require a group field')
+            raise ValueError('Leave one out strategy require a group field.')
+        if group_by != self.uid_field:
+            raise ValueError('Sequential models require group by user.')
 
         self.prepare_data_augmentation()
         grouped_index = self._grouped_index(self.uid_list)
@@ -111,3 +105,43 @@ def leave_one_out(self, group_by, leave_one_num=1):
                 setattr(ds, field, np.array(getattr(ds, field)[index]))
             next_ds.append(ds)
         return next_ds
+
+    def inter_matrix(self, form='coo', value_field=None):
+        """Get sparse matrix that describe interactions between user_id and item_id.
+        Sparse matrix has shape (user_num, item_num).
+        For a row of <src, tgt>, ``matrix[src, tgt] = 1`` if ``value_field`` is ``None``,
+        else ``matrix[src, tgt] = self.inter_feat[src, tgt]``.
+
+        Args:
+            form (str, optional): Sparse matrix format. Defaults to ``coo``.
+            value_field (str, optional): Data of sparse matrix, which should exist in ``df_feat``.
+                Defaults to ``None``.
+
+        Returns:
+            scipy.sparse: Sparse matrix in form ``coo`` or ``csr``.
+        """
+        if not self.uid_field or not self.iid_field:
+            raise ValueError('dataset does not exist uid/iid, thus can not converted to sparse matrix.')
+
+        self.logger.warning('Load interaction matrix may lead to label leakage from testing phase, this implementation '
+                            'only provides the interactions corresponding to specific phase')
+        local_inter_feat = self.inter_feat[self.uid_list]
+        return self._create_sparse_matrix(local_inter_feat, self.uid_field, self.iid_field, form, value_field)
+
+    def build(self, eval_setting):
+        ordering_args = eval_setting.ordering_args
+        if ordering_args['strategy'] == 'shuffle':
+            raise ValueError('Ordering strategy `shuffle` is not supported in sequential models.')
+        elif ordering_args['strategy'] == 'by':
+            if ordering_args['field'] != self.time_field:
+                raise ValueError('Sequential models require `TO` (time ordering) strategy.')
+            if ordering_args['ascending'] is not True:
+                raise ValueError('Sequential models require `time_field` to sort in ascending order.')
+
+        group_field = eval_setting.group_field
+
+        split_args = eval_setting.split_args
+        if split_args['strategy'] == 'loo':
+            return self.leave_one_out(group_by=group_field, leave_one_num=split_args['leave_one_num'])
+        else:
+            ValueError('Sequential models require `loo` (leave one out) split strategy.')
diff --git a/recbole/data/dataset/social_dataset.py b/recbole/data/dataset/social_dataset.py
index ebeb8ebcb..f53016ccf 100644
--- a/recbole/data/dataset/social_dataset.py
+++ b/recbole/data/dataset/social_dataset.py
@@ -45,8 +45,8 @@ def _get_field_from_config(self):
         self.target_field = self.config['TARGET_ID_FIELD']
         self._check_field('source_field', 'target_field')
 
-        self.logger.debug('source_id_field: {}'.format(self.source_field))
-        self.logger.debug('target_id_field: {}'.format(self.target_field))
+        self.logger.debug(f'source_id_field: {self.source_field}')
+        self.logger.debug(f'target_id_field: {self.target_field}')
 
     def _load_data(self, token, dataset_path):
         """Load ``.net`` additionally.
@@ -61,14 +61,14 @@ def _build_feat_name_list(self):
         return feat_name_list
 
     def _load_net(self, dataset_name, dataset_path):
-        net_file_path = os.path.join(dataset_path, '{}.{}'.format(dataset_name, 'net'))
+        net_file_path = os.path.join(dataset_path, f'{dataset_name}.net')
         if os.path.isfile(net_file_path):
             net_feat = self._load_feat(net_file_path, FeatureSource.NET)
             if net_feat is None:
                 raise ValueError('.net file exist, but net_feat is None, please check your load_col')
             return net_feat
         else:
-            raise ValueError('File {} not exist'.format(net_file_path))
+            raise ValueError(f'File {net_file_path} not exist.')
 
     def _get_fields_in_same_space(self):
         """Parsing ``config['fields_in_same_space']``. See :doc:`../user_guide/data/data_args` for detail arg setting.
@@ -80,8 +80,9 @@ def _get_fields_in_same_space(self):
             - ``source_id`` and ``target_id`` should be remapped with ``user_id``.
         """
         fields_in_same_space = super()._get_fields_in_same_space()
-        fields_in_same_space = [_ for _ in fields_in_same_space if (self.source_field not in _) and
-                                (self.target_field not in _)]
+        fields_in_same_space = [
+            _ for _ in fields_in_same_space if (self.source_field not in _) and (self.target_field not in _)
+        ]
         for field_set in fields_in_same_space:
             if self.uid_field in field_set:
                 field_set.update({self.source_field, self.target_field})
@@ -122,6 +123,5 @@ def net_graph(self, form='coo', value_field=None):
             raise NotImplementedError('net graph format [{}] has not been implemented.')
 
     def __str__(self):
-        info = [super().__str__(),
-                'The number of connections of social network: {}'.format(len(self.net_feat))]
+        info = [super().__str__(), f'The number of connections of social network: {len(self.net_feat)}']
         return '\n'.join(info)
diff --git a/recbole/data/interaction.py b/recbole/data/interaction.py
index 3e5360d6b..218b8b035 100644
--- a/recbole/data/interaction.py
+++ b/recbole/data/interaction.py
@@ -86,7 +86,7 @@ def __init__(self, interaction, pos_len_list=None, user_len_list=None):
         self.set_additional_info(pos_len_list, user_len_list)
         for k in self.interaction:
             if not isinstance(self.interaction[k], torch.Tensor):
-                raise ValueError('interaction [{}] should only contains torch.Tensor'.format(interaction))
+                raise ValueError(f'Interaction [{interaction}] should only contains torch.Tensor')
         self.length = -1
         for k in self.interaction:
             self.length = max(self.length, self.interaction[k].shape[0])
@@ -116,10 +116,10 @@ def __len__(self):
         return self.length
 
     def __str__(self):
-        info = ['The batch_size of interaction: {}'.format(self.length)]
+        info = [f'The batch_size of interaction: {self.length}']
         for k in self.interaction:
             inter = self.interaction[k]
-            temp_str = "    {}, {}, {}, {}".format(k, inter.shape, inter.device.type, inter.dtype)
+            temp_str = f"    {k}, {inter.shape}, {inter.device.type}, {inter.dtype}"
             info.append(temp_str)
         info.append('\n')
         return '\n'.join(info)
@@ -253,7 +253,7 @@ def drop(self, column):
             column (str): the column to be dropped.
         """
         if column not in self.interaction:
-            raise ValueError('column [{}] is not in [{}]'.format(column, self))
+            raise ValueError(f'Column [{column}] is not in [{self}].')
         del self.interaction[column]
 
     def _reindex(self, index):
@@ -285,29 +285,29 @@ def sort(self, by, ascending=True):
         """
         if isinstance(by, str):
             if by not in self.interaction:
-                raise ValueError('[{}] is not exist in interaction [{}]'.format(by, self))
+                raise ValueError(f'[{by}] is not exist in interaction [{self}].')
             by = [by]
         elif isinstance(by, (list, tuple)):
             for b in by:
                 if b not in self.interaction:
-                    raise ValueError('[{}] is not exist in interaction [{}]'.format(b, self))
+                    raise ValueError(f'[{b}] is not exist in interaction [{self}].')
         else:
-            raise TypeError('wrong type of by [{}]'.format(by))
+            raise TypeError(f'Wrong type of by [{by}].')
 
         if isinstance(ascending, bool):
             ascending = [ascending]
         elif isinstance(ascending, (list, tuple)):
             for a in ascending:
                 if not isinstance(a, bool):
-                    raise TypeError('wrong type of ascending [{}]'.format(ascending))
+                    raise TypeError(f'Wrong type of ascending [{ascending}].')
         else:
-            raise TypeError('wrong type of ascending [{}]'.format(ascending))
+            raise TypeError(f'Wrong type of ascending [{ascending}].')
 
         if len(by) != len(ascending):
             if len(ascending) == 1:
                 ascending = ascending * len(by)
             else:
-                raise ValueError('by [{}] and ascending [{}] should have same length'.format(by, ascending))
+                raise ValueError(f'by [{by}] and ascending [{ascending}] should have same length.')
 
         for b, a in zip(by[::-1], ascending[::-1]):
             index = np.argsort(self.interaction[b], kind='stable')
@@ -334,15 +334,14 @@ def cat_interactions(interactions):
         :class:`Interaction`: Concatenated interaction.
     """
     if not isinstance(interactions, (list, tuple)):
-        raise TypeError('interactions [{}] should be list or tuple'.format(interactions))
+        raise TypeError(f'Interactions [{interactions}] should be list or tuple.')
     if len(interactions) == 0:
-        raise ValueError('interactions [{}] should have some interactions'.format(interactions))
+        raise ValueError(f'Interactions [{interactions}] should have some interactions.')
 
     columns_set = set(interactions[0].columns)
     for inter in interactions:
         if columns_set != set(inter.columns):
-            raise ValueError('interactions [{}] should have some interactions'.format(interactions))
+            raise ValueError(f'Interactions [{interactions}] should have some interactions.')
 
-    new_inter = {col: torch.cat([inter[col] for inter in interactions])
-                 for col in columns_set}
+    new_inter = {col: torch.cat([inter[col] for inter in interactions]) for col in columns_set}
     return Interaction(new_inter)
diff --git a/recbole/data/utils.py b/recbole/data/utils.py
index fa8c4abf3..88b5180a2 100644
--- a/recbole/data/utils.py
+++ b/recbole/data/utils.py
@@ -19,7 +19,7 @@
 from recbole.config import EvalSetting
 from recbole.data.dataloader import *
 from recbole.sampler import KGSampler, Sampler, RepeatableSampler
-from recbole.utils import ModelType
+from recbole.utils import ModelType, ensure_dir
 
 
 def create_dataset(config):
@@ -45,9 +45,9 @@ def create_dataset(config):
         elif model_type == ModelType.SOCIAL:
             from .dataset import SocialDataset
             return SocialDataset(config)
-        elif model_type == ModelType.XGBOOST:
-            from .dataset import XgboostDataset
-            return XgboostDataset(config)
+        elif model_type == ModelType.DECISIONTREE:
+            from .dataset import DecisionTreeDataset
+            return DecisionTreeDataset(config)
         else:
             from .dataset import Dataset
             return Dataset(config)
@@ -73,21 +73,7 @@ def data_preparation(config, dataset, save=False):
 
     es_str = [_.strip() for _ in config['eval_setting'].split(',')]
     es = EvalSetting(config)
-
-    kwargs = {}
-    if 'RS' in es_str[0]:
-        kwargs['ratios'] = config['split_ratio']
-        if kwargs['ratios'] is None:
-            raise ValueError('`ratios` should be set if `RS` is set')
-    if 'LS' in es_str[0]:
-        kwargs['leave_one_num'] = config['leave_one_num']
-        if kwargs['leave_one_num'] is None:
-            raise ValueError('`leave_one_num` should be set if `LS` is set')
-    kwargs['group_by_user'] = config['group_by_user']
-    getattr(es, es_str[0])(**kwargs)
-
-    if es.split_args['strategy'] != 'loo' and model_type == ModelType.SEQUENTIAL:
-        raise ValueError('Sequential models require "loo" split strategy.')
+    es.set_ordering_and_splitting(es_str[0])
 
     built_datasets = dataset.build(es)
     train_dataset, valid_dataset, test_dataset = built_datasets
@@ -100,8 +86,10 @@ def data_preparation(config, dataset, save=False):
     kwargs = {}
     if config['training_neg_sample_num']:
         if dataset.label_field in dataset.inter_feat:
-            raise ValueError(f'`training_neg_sample_num` should be 0 '
-                             f'if inter_feat have label_field [{dataset.label_field}].')
+            raise ValueError(
+                f'`training_neg_sample_num` should be 0 '
+                f'if inter_feat have label_field [{dataset.label_field}].'
+            )
         train_distribution = config['training_neg_sample_distribution'] or 'uniform'
         es.neg_sample_by(by=config['training_neg_sample_num'], distribution=train_distribution)
         if model_type != ModelType.SEQUENTIAL:
@@ -127,8 +115,10 @@ def data_preparation(config, dataset, save=False):
     kwargs = {}
     if len(es_str) > 1 and getattr(es, es_str[1], None):
         if dataset.label_field in dataset.inter_feat:
-            raise ValueError(f'It can not validate with `{es_str[1]}` '
-                             f'when inter_feat have label_field [{dataset.label_field}].')
+            raise ValueError(
+                f'It can not validate with `{es_str[1]}` '
+                f'when inter_feat have label_field [{dataset.label_field}].'
+            )
         getattr(es, es_str[1])()
         if sampler is None:
             if model_type != ModelType.SEQUENTIAL:
@@ -150,9 +140,9 @@ def data_preparation(config, dataset, save=False):
     return train_data, valid_data, test_data
 
 
-def dataloader_construct(name, config, eval_setting, dataset,
-                         dl_format=InputType.POINTWISE,
-                         batch_size=1, shuffle=False, **kwargs):
+def dataloader_construct(
+    name, config, eval_setting, dataset, dl_format=InputType.POINTWISE, batch_size=1, shuffle=False, **kwargs
+):
     """Get a correct dataloader class by calling :func:`get_data_loader` to construct dataloader.
 
     Args:
@@ -177,7 +167,7 @@ def dataloader_construct(name, config, eval_setting, dataset,
         batch_size = [batch_size] * len(dataset)
 
     if len(dataset) != len(batch_size):
-        raise ValueError('dataset {} and batch_size {} should have the same length'.format(dataset, batch_size))
+        raise ValueError(f'Dataset {dataset} and batch_size {batch_size} should have the same length.')
 
     kwargs_list = [{} for _ in range(len(dataset))]
     for key, value in kwargs.items():
@@ -185,28 +175,22 @@ def dataloader_construct(name, config, eval_setting, dataset,
         if not isinstance(value, list):
             value = [value] * len(dataset)
         if len(dataset) != len(value):
-            raise ValueError('dataset {} and {} {} should have the same length'.format(dataset, key, value))
+            raise ValueError(f'Dataset {dataset} and {key} {value} should have the same length.')
         for kw, k, w in zip(kwargs_list, key, value):
             kw[k] = w
 
     model_type = config['MODEL_TYPE']
     logger = getLogger()
-    logger.info('Build [{}] DataLoader for [{}] with format [{}]'.format(model_type, name, dl_format))
+    logger.info(f'Build [{model_type}] DataLoader for [{name}] with format [{dl_format}]')
     logger.info(eval_setting)
-    logger.info('batch_size = [{}], shuffle = [{}]\n'.format(batch_size, shuffle))
+    logger.info(f'batch_size = [{batch_size}], shuffle = [{shuffle}]\n')
 
     dataloader = get_data_loader(name, config, eval_setting)
 
     try:
         ret = [
-            dataloader(
-                config=config,
-                dataset=ds,
-                batch_size=bs,
-                dl_format=dl_format,
-                shuffle=shuffle,
-                **kw
-            ) for ds, bs, kw in zip(dataset, batch_size, kwargs_list)
+            dataloader(config=config, dataset=ds, batch_size=bs, dl_format=dl_format, shuffle=shuffle, **kw)
+            for ds, bs, kw in zip(dataset, batch_size, kwargs_list)
         ]
     except TypeError:
         raise ValueError('training_neg_sample_num should be 0')
@@ -229,11 +213,10 @@ def save_datasets(save_path, name, dataset):
         name = [name]
         dataset = [dataset]
     if len(name) != len(dataset):
-        raise ValueError('len of name {} should equal to len of dataset'.format(name, dataset))
+        raise ValueError(f'Length of name {name} should equal to length of dataset {dataset}.')
     for i, d in enumerate(dataset):
         cur_path = os.path.join(save_path, name[i])
-        if not os.path.isdir(cur_path):
-            os.makedirs(cur_path)
+        ensure_dir(cur_path)
         d.save(cur_path)
 
 
@@ -283,13 +266,13 @@ def get_data_loader(name, config, eval_setting):
             return SequentialNegSampleDataLoader
         elif neg_sample_strategy == 'full':
             return SequentialFullDataLoader
-    elif model_type == ModelType.XGBOOST:
+    elif model_type == ModelType.DECISIONTREE:
         if neg_sample_strategy == 'none':
-            return XgboostDataLoader
+            return DecisionTreeDataLoader
         elif neg_sample_strategy == 'by':
-            return XgboostNegSampleDataLoader
+            return DecisionTreeNegSampleDataLoader
         elif neg_sample_strategy == 'full':
-            return XgboostFullDataLoader
+            return DecisionTreeFullDataLoader
     elif model_type == ModelType.KNOWLEDGE:
         if neg_sample_strategy == 'by':
             if name == 'train':
@@ -301,10 +284,11 @@ def get_data_loader(name, config, eval_setting):
         elif neg_sample_strategy == 'none':
             # return GeneralDataLoader
             # TODO 训练也可以为none? 看general的逻辑似乎是都可以为None
-            raise NotImplementedError('The use of external negative sampling for knowledge model '
-                                      'has not been implemented')
+            raise NotImplementedError(
+                'The use of external negative sampling for knowledge model has not been implemented'
+            )
     else:
-        raise NotImplementedError('model_type [{}] has not been implemented'.format(model_type))
+        raise NotImplementedError(f'Model_type [{model_type}] has not been implemented.')
 
 
 def _get_DIN_data_loader(name, config, eval_setting):
diff --git a/recbole/evaluator/evaluators.py b/recbole/evaluator/evaluators.py
index 8b6568eef..8b0d97238 100644
--- a/recbole/evaluator/evaluators.py
+++ b/recbole/evaluator/evaluators.py
@@ -13,7 +13,6 @@
 #####################################
 """
 
-
 from collections import ChainMap
 
 import numpy as np
@@ -64,11 +63,10 @@ def collect(self, interaction, scores_tensor):
 
        """
         user_len_list = interaction.user_len_list
-        
+
         scores_matrix = self.get_score_matrix(scores_tensor, user_len_list)
         scores_matrix = torch.flip(scores_matrix, dims=[-1])
-        shape_matrix = torch.full((len(user_len_list), 1), scores_matrix.shape[1],
-                                  device=scores_matrix.device)
+        shape_matrix = torch.full((len(user_len_list), 1), scores_matrix.shape[1], device=scores_matrix.device)
 
         # get topk
         _, topk_idx = torch.topk(scores_matrix, max(self.topk), dim=-1)  # n_users x k
@@ -114,8 +112,10 @@ def _check_args(self):
                 self.topk = [self.topk]
             for topk in self.topk:
                 if topk <= 0:
-                    raise ValueError('topk must be a positive integer or a list of positive integers, '
-                                     'but get `{}`'.format(topk))
+                    raise ValueError(
+                        'topk must be a positive integer or a list of positive integers, '
+                        'but get `{}`'.format(topk)
+                    )
         else:
             raise TypeError('The topk must be a integer, list')
 
@@ -189,7 +189,7 @@ def average_rank(self, scores):
             torch.Tensor: average_rank
 
         Example:
-            >>> average_rank(tensor([[1,2,2,2,3,3,6],[2,2,2,2,4,4,5]]))
+            >>> average_rank(tensor([[1,2,2,2,3,3,6],[2,2,2,2,4,5,5]]))
             tensor([[1.0000, 3.0000, 3.0000, 3.0000, 5.5000, 5.5000, 7.0000],
             [2.5000, 2.5000, 2.5000, 2.5000, 5.0000, 6.5000, 6.5000]])
 
@@ -243,7 +243,7 @@ def evaluate(self, batch_matrix_list, eval_data):
             eval_data (Dataset): the class of test data
 
         Returns:
-            dict: such as ``{'GAUC:0.9286}``
+            dict: such as ``{'GAUC': 0.9286}``
 
         """
         pos_len_list = eval_data.get_pos_len_list()
@@ -282,8 +282,6 @@ def __str__(self):
         msg = 'The Rank Evaluator Info:\n' + \
               '\tMetrics:[' + \
               ', '.join([rank_metrics[metric.lower()] for metric in self.metrics]) + \
-              '], TopK:[' + \
-              ', '.join(map(str, self.topk)) + \
               ']'
         return msg
 
@@ -369,8 +367,4 @@ def __str__(self):
         return msg
 
 
-metric_eval_bind = [
-    (topk_metrics, TopKEvaluator),
-    (loss_metrics, LossEvaluator),
-    (rank_metrics, RankEvaluator)
-]
+metric_eval_bind = [(topk_metrics, TopKEvaluator), (loss_metrics, LossEvaluator), (rank_metrics, RankEvaluator)]
diff --git a/recbole/evaluator/metrics.py b/recbole/evaluator/metrics.py
index b788fa0eb..9c470e12b 100644
--- a/recbole/evaluator/metrics.py
+++ b/recbole/evaluator/metrics.py
@@ -21,9 +21,9 @@
 
 from recbole.evaluator.utils import _binary_clf_curve
 
-
 #    TopK Metrics    #
 
+
 def hit_(pos_index, pos_len):
     r"""Hit_ (also known as hit ratio at :math:`N`) is a way of calculating how many 'hits' you have
     in an n-sized list of ranked items.
@@ -120,13 +120,13 @@ def ndcg_(pos_index, pos_len):
             \mathrm {DCG@K}=\sum_{i=1}^{K} \frac{2^{rel_i}-1}{\log_{2}{(i+1)}}\\
             \mathrm {IDCG@K}=\sum_{i=1}^{K}\frac{1}{\log_{2}{(i+1)}}\\
             \mathrm {NDCG_u@K}=\frac{DCG_u@K}{IDCG_u@K}\\
-            \mathrm {NDCG@K}=\frac{\sum \nolimits_{u \in u^{te}NDCG_u@K}}{|u^{te}|}
+            \mathrm {NDCG@K}=\frac{\sum \nolimits_{u \in U^{te}NDCG_u@K}}{|U^{te}|}
         \end{gather}
 
     :math:`K` stands for recommending :math:`K` items.
     And the :math:`rel_i` is the relevance of the item in position :math:`i` in the recommendation list.
-    :math:`2^{rel_i}` equals to 1 if the item hits otherwise 0.
-    :math:`U^{te}` is for all users in the test set.
+    :math:`{rel_i}` equals to 1 if the item is ground truth otherwise 0.
+    :math:`U^{te}` stands for all users in the test set.
 
     """
     len_rank = np.full_like(pos_len, pos_index.shape[1])
@@ -194,15 +194,19 @@ def gauc_(user_len_list, pos_len_list, pos_rank_sum):
     non_zero_idx = np.full(len(user_len_list), True, dtype=np.bool)
     if any_without_pos:
         logger = getLogger()
-        logger.warning("No positive samples in some users, "
-                       "true positive value should be meaningless, "
-                       "these users have been removed from GAUC calculation")
+        logger.warning(
+            "No positive samples in some users, "
+            "true positive value should be meaningless, "
+            "these users have been removed from GAUC calculation"
+        )
         non_zero_idx *= (pos_len_list != 0)
     if any_without_neg:
         logger = getLogger()
-        logger.warning("No negative samples in some users, "
-                       "false positive value should be meaningless, "
-                       "these users have been removed from GAUC calculation")
+        logger.warning(
+            "No negative samples in some users, "
+            "false positive value should be meaningless, "
+            "these users have been removed from GAUC calculation"
+        )
         non_zero_idx *= (neg_len_list != 0)
     if any_without_pos or any_without_neg:
         item_list = user_len_list, neg_len_list, pos_len_list, pos_rank_sum
@@ -249,16 +253,14 @@ def auc_(trues, preds):
 
     if fps[-1] <= 0:
         logger = getLogger()
-        logger.warning("No negative samples in y_true, "
-                       "false positive value should be meaningless")
+        logger.warning("No negative samples in y_true, " "false positive value should be meaningless")
         fpr = np.repeat(np.nan, fps.shape)
     else:
         fpr = fps / fps[-1]
 
     if tps[-1] <= 0:
         logger = getLogger()
-        logger.warning("No positive samples in y_true, "
-                       "true positive value should be meaningless")
+        logger.warning("No positive samples in y_true, " "true positive value should be meaningless")
         tpr = np.repeat(np.nan, tps.shape)
     else:
         tpr = tps / tps[-1]
@@ -268,6 +270,7 @@ def auc_(trues, preds):
 
 # Loss based Metrics #
 
+
 def mae_(trues, preds):
     r"""`Mean absolute error regression loss`__
 
@@ -313,7 +316,7 @@ def log_loss_(trues, preds):
     eps = 1e-15
     preds = np.float64(preds)
     preds = np.clip(preds, eps, 1 - eps)
-    loss = np.sum(- trues * np.log(preds) - (1 - trues) * np.log(1 - preds))
+    loss = np.sum(-trues * np.log(preds) - (1 - trues) * np.log(1 - preds))
 
     return loss / len(preds)
 
@@ -324,19 +327,15 @@ def log_loss_(trues, preds):
 # def coverage_():
 #     raise NotImplementedError
 
-
 # def gini_index_():
 #     raise NotImplementedError
 
-
 # def shannon_entropy_():
 #     raise NotImplementedError
 
-
 # def diversity_():
 #     raise NotImplementedError
 
-
 """Function name and function mapper.
 Useful when we have to serialize evaluation metric names
 and call the functions based on deserialized names
diff --git a/recbole/evaluator/proxy_evaluator.py b/recbole/evaluator/proxy_evaluator.py
index 237e8c29e..a0df5246e 100644
--- a/recbole/evaluator/proxy_evaluator.py
+++ b/recbole/evaluator/proxy_evaluator.py
@@ -8,6 +8,11 @@
 # @Author  :   Zhichao Feng
 # @email   :   fzcbupt@gmail.com
 
+"""
+recbole.evaluator.proxy_evaluator
+#####################################
+"""
+
 from collections import ChainMap
 
 from recbole.evaluator.evaluators import metric_eval_bind, group_metrics, individual_metrics
diff --git a/recbole/model/abstract_recommender.py b/recbole/model/abstract_recommender.py
index 93c6dfb57..d5d7f8dc0 100644
--- a/recbole/model/abstract_recommender.py
+++ b/recbole/model/abstract_recommender.py
@@ -7,7 +7,6 @@
 # @Author : Shanlei Mu, Yupeng Hou
 # @Email  : slmu@ruc.edu.cn, houyupeng@ruc.edu.cn
 
-
 """
 recbole.model.abstract_recommender
 ##################################
@@ -92,7 +91,6 @@ def __init__(self, config, dataset):
         self.n_items = dataset.num(self.ITEM_ID)
 
         # load parameters info
-        self.batch_size = config['train_batch_size']
         self.device = config['device']
 
 
@@ -146,7 +144,6 @@ def __init__(self, config, dataset):
         self.n_relations = dataset.num(self.RELATION_ID)
 
         # load parameters info
-        self.batch_size = config['train_batch_size']
         self.device = config['device']
 
 
@@ -221,11 +218,13 @@ def __init__(self, config, dataset):
             self.num_feature_field += 1
         if len(self.token_field_dims) > 0:
             self.token_field_offsets = np.array((0, *np.cumsum(self.token_field_dims)[:-1]), dtype=np.long)
-            self.token_embedding_table = FMEmbedding(self.token_field_dims, self.token_field_offsets,
-                                                     self.embedding_size)
+            self.token_embedding_table = FMEmbedding(
+                self.token_field_dims, self.token_field_offsets, self.embedding_size
+            )
         if len(self.float_field_dims) > 0:
-            self.float_embedding_table = nn.Embedding(np.sum(self.float_field_dims, dtype=np.int32),
-                                                      self.embedding_size)
+            self.float_embedding_table = nn.Embedding(
+                np.sum(self.float_field_dims, dtype=np.int32), self.embedding_size
+            )
         if len(self.token_seq_field_dims) > 0:
             self.token_seq_embedding_table = nn.ModuleList()
             for token_seq_field_dim in self.token_seq_field_dims:
@@ -336,8 +335,10 @@ def double_tower_embed_input_fields(self, interaction):
             first_dense_embedding, second_dense_embedding = None, None
 
         if sparse_embedding is not None:
-            sizes = [self.user_token_seq_field_num, self.item_token_seq_field_num,
-                     self.user_token_field_num, self.item_token_field_num]
+            sizes = [
+                self.user_token_seq_field_num, self.item_token_seq_field_num, self.user_token_field_num,
+                self.item_token_field_num
+            ]
             first_token_seq_embedding, second_token_seq_embedding, first_token_embedding, second_token_embedding = \
                 torch.split(sparse_embedding, sizes, dim=1)
             first_sparse_embedding = torch.cat([first_token_seq_embedding, first_token_embedding], dim=1)
@@ -368,8 +369,10 @@ def embed_input_fields(self, interaction):
         """
         float_fields = []
         for field_name in self.float_field_names:
-            float_fields.append(interaction[field_name]
-                                if len(interaction[field_name].shape) == 2 else interaction[field_name].unsqueeze(1))
+            if len(interaction[field_name].shape) == 2:
+                float_fields.append(interaction[field_name])
+            else:
+                float_fields.append(interaction[field_name].unsqueeze(1))
         if len(float_fields) > 0:
             float_fields = torch.cat(float_fields, dim=1)  # [batch_size, num_float_field]
         else:
diff --git a/recbole/model/context_aware_recommender/dcn.py b/recbole/model/context_aware_recommender/dcn.py
index 396130803..0edfc6ad8 100644
--- a/recbole/model/context_aware_recommender/dcn.py
+++ b/recbole/model/context_aware_recommender/dcn.py
@@ -44,10 +44,14 @@ def __init__(self, config, dataset):
 
         # define layers and loss
         # init weight and bias of each cross layer
-        self.cross_layer_w = nn.ParameterList(nn.Parameter(torch.randn(self.num_feature_field * self.embedding_size)
-                                                           .to(self.device)) for _ in range(self.cross_layer_num))
-        self.cross_layer_b = nn.ParameterList(nn.Parameter(torch.zeros(self.num_feature_field * self.embedding_size)
-                                                           .to(self.device)) for _ in range(self.cross_layer_num))
+        self.cross_layer_w = nn.ParameterList(
+            nn.Parameter(torch.randn(self.num_feature_field * self.embedding_size).to(self.device))
+            for _ in range(self.cross_layer_num)
+        )
+        self.cross_layer_b = nn.ParameterList(
+            nn.Parameter(torch.zeros(self.num_feature_field * self.embedding_size).to(self.device))
+            for _ in range(self.cross_layer_num)
+        )
 
         # size of mlp hidden layer
         size_list = [self.embedding_size * self.num_feature_field] + self.mlp_hidden_size
diff --git a/recbole/model/context_aware_recommender/dssm.py b/recbole/model/context_aware_recommender/dssm.py
index 346c0f23a..59d951af0 100644
--- a/recbole/model/context_aware_recommender/dssm.py
+++ b/recbole/model/context_aware_recommender/dssm.py
@@ -4,7 +4,6 @@
 # @Email  : gmqszyq@qq.com
 # @File   : dssm.py
 
-
 """
 DSSM
 ################################################
diff --git a/recbole/model/context_aware_recommender/ffm.py b/recbole/model/context_aware_recommender/ffm.py
index 3863d1bfe..6879adaef 100644
--- a/recbole/model/context_aware_recommender/ffm.py
+++ b/recbole/model/context_aware_recommender/ffm.py
@@ -49,8 +49,10 @@ def __init__(self, config, dataset):
         self._get_feature2field()
         self.num_fields = len(set(self.feature2field.values()))  # the number of fields
 
-        self.ffm = FieldAwareFactorizationMachine(self.feature_names, self.feature_dims, self.feature2id,
-                                                  self.feature2field, self.num_fields, self.embedding_size, self.device)
+        self.ffm = FieldAwareFactorizationMachine(
+            self.feature_names, self.feature_dims, self.feature2id, self.feature2field, self.num_fields,
+            self.embedding_size, self.device
+        )
         self.loss = nn.BCELoss()
 
         # parameters initialization
@@ -217,24 +219,17 @@ def forward(self, input_x):
     def _get_input_x_emb(self, token_input_x_emb, float_input_x_emb, token_seq_input_x_emb):
         # merge different types of field-aware embeddings
         input_x_emb = []  # [num_fields: [batch_size, num_fields, emb_dim]]
-        if len(self.token_feature_names) > 0 and len(self.float_feature_names) > 0 and len(self.token_seq_feature_names) > 0:
-            for i in range(self.num_fields):
-                input_x_emb.append(torch.cat([token_input_x_emb[i], float_input_x_emb[i], token_seq_input_x_emb[i]], dim=1))
-        elif len(self.token_feature_names) > 0 and len(self.float_feature_names) > 0:
-            for i in range(self.num_fields):
-                input_x_emb.append(torch.cat([token_input_x_emb[i], float_input_x_emb[i]], dim=1))
-        elif len(self.float_feature_names) > 0 and len(self.token_seq_feature_names) > 0:
-            for i in range(self.num_fields):
-                input_x_emb.append(torch.cat([float_input_x_emb[i], token_seq_input_x_emb[i]], dim=1))
-        elif len(self.token_feature_names) > 0 and len(self.token_seq_feature_names) > 0:
-            for i in range(self.num_fields):
-                input_x_emb.append(torch.cat([token_input_x_emb[i], token_seq_input_x_emb[i]], dim=1))
-        elif len(self.token_feature_names) > 0:
-            input_x_emb = token_input_x_emb
-        elif len(self.float_feature_names) > 0:
-            input_x_emb = float_input_x_emb
-        elif len(self.token_seq_feature_names) > 0:
-            input_x_emb = token_seq_input_x_emb
+
+        zip_args = []
+        if len(self.token_feature_names) > 0:
+            zip_args.append(token_input_x_emb)
+        if len(self.float_feature_names) > 0:
+            zip_args.append(float_input_x_emb)
+        if len(self.token_seq_feature_names) > 0:
+            zip_args.append(token_seq_input_x_emb)
+
+        for tensors in zip(*zip_args):
+            input_x_emb.append(torch.cat(tensors, dim=1))
 
         return input_x_emb
 
@@ -243,8 +238,9 @@ def _emb_token_ffm_input(self, token_ffm_input):
         token_input_x_emb = []
         if len(self.token_feature_names) > 0:
             token_input_x = token_ffm_input + token_ffm_input.new_tensor(self.token_offsets).unsqueeze(0)
-            token_input_x_emb = [self.token_embeddings[i](token_input_x)
-                                 for i in range(self.num_fields)]  # [num_fields: [batch_size, num_token_features, emb_dim]]
+            token_input_x_emb = [
+                self.token_embeddings[i](token_input_x) for i in range(self.num_fields)
+            ]  # [num_fields: [batch_size, num_token_features, emb_dim]]
 
         return token_input_x_emb
 
@@ -252,9 +248,13 @@ def _emb_float_ffm_input(self, float_ffm_input):
         # get float field-aware embeddings
         float_input_x_emb = []
         if len(self.float_feature_names) > 0:
-            index = torch.arange(0, self.num_float_features).unsqueeze(0).expand_as(float_ffm_input).long().to(self.device)  # [batch_size, num_float_features]
-            float_input_x_emb = [torch.mul(self.float_embeddings[i](index), float_ffm_input.unsqueeze(2))
-                                 for i in range(self.num_fields)]  # [num_fields: [batch_size, num_float_features, emb_dim]]
+            index = torch.arange(0, self.num_float_features).unsqueeze(0).expand_as(float_ffm_input).long().to(
+                self.device
+            )  # [batch_size, num_float_features]
+            float_input_x_emb = [
+                torch.mul(self.float_embeddings[i](index), float_ffm_input.unsqueeze(2))
+                for i in range(self.num_fields)
+            ]  # [num_fields: [batch_size, num_float_features, emb_dim]]
 
         return float_input_x_emb
 
@@ -280,6 +280,8 @@ def _emb_token_seq_ffm_input(self, token_seq_ffm_input):
                     result = result.unsqueeze(1)  # [batch_size, 1, embed_dim]
 
                     token_seq_result.append(result)
-                token_seq_input_x_emb.append(torch.cat(token_seq_result, dim=1))  # [num_fields: batch_size, num_token_seq_features, embed_dim]
+                token_seq_input_x_emb.append(
+                    torch.cat(token_seq_result, dim=1)
+                )  # [num_fields: batch_size, num_token_seq_features, embed_dim]
 
         return token_seq_input_x_emb
diff --git a/recbole/model/context_aware_recommender/pnn.py b/recbole/model/context_aware_recommender/pnn.py
index 2c0b4946b..bac4941cf 100644
--- a/recbole/model/context_aware_recommender/pnn.py
+++ b/recbole/model/context_aware_recommender/pnn.py
@@ -50,8 +50,7 @@ def __init__(self, config, dataset):
 
         if self.use_outer:
             product_out_dim += self.num_pair
-            self.outer_product = OuterProductLayer(
-                self.num_feature_field, self.embedding_size, device=self.device)
+            self.outer_product = OuterProductLayer(self.num_feature_field, self.embedding_size, device=self.device)
         size_list = [product_out_dim] + self.mlp_hidden_size
         self.mlp_layers = MLPLayers(size_list, self.dropout_prob, bn=False)
         self.predict_layer = nn.Linear(self.mlp_hidden_size[-1], 1)
diff --git a/recbole/model/context_aware_recommender/xdeepfm.py b/recbole/model/context_aware_recommender/xdeepfm.py
index b0c2f1e1a..0af6ef770 100644
--- a/recbole/model/context_aware_recommender/xdeepfm.py
+++ b/recbole/model/context_aware_recommender/xdeepfm.py
@@ -48,8 +48,10 @@ def __init__(self, config, dataset):
         if not self.direct:
             self.cin_layer_size = list(map(lambda x: int(x // 2 * 2), temp_cin_size))
             if self.cin_layer_size[:-1] != temp_cin_size[:-1]:
-                self.logger.warning('Layer size of CIN should be even except for the last layer when direct is True.'
-                                    'It is changed to {}'.format(self.cin_layer_size))
+                self.logger.warning(
+                    'Layer size of CIN should be even except for the last layer when direct is True.'
+                    'It is changed to {}'.format(self.cin_layer_size)
+                )
 
         # Create a convolutional layer for each CIN layer
         self.conv1d_list = []
@@ -63,8 +65,7 @@ def __init__(self, config, dataset):
                 self.field_nums.append(layer_size // 2)
 
         # Create MLP layer
-        size_list = [self.embedding_size * self.num_feature_field
-                     ] + self.mlp_hidden_size + [1]
+        size_list = [self.embedding_size * self.num_feature_field] + self.mlp_hidden_size + [1]
         self.mlp_layers = MLPLayers(size_list, dropout=self.dropout_prob)
 
         # Get the output size of CIN
@@ -156,8 +157,7 @@ def compressed_interaction_network(self, input_features, activation='identity'):
                 next_hidden = output
             else:
                 if i != len(self.cin_layer_size) - 1:
-                    next_hidden, direct_connect = torch.split(
-                        output, 2 * [layer_size // 2], 1)
+                    next_hidden, direct_connect = torch.split(output, 2 * [layer_size // 2], 1)
                 else:
                     direct_connect = output
                     next_hidden = 0
diff --git a/recbole/model/exlib_recommender/lightgbm.py b/recbole/model/exlib_recommender/lightgbm.py
new file mode 100644
index 000000000..28ef5d9a2
--- /dev/null
+++ b/recbole/model/exlib_recommender/lightgbm.py
@@ -0,0 +1,26 @@
+# -*- coding: utf-8 -*-
+# @Time   : 2020/1/17
+# @Author : Chen Yang
+# @Email  : 254170321@qq.com
+
+r"""
+recbole.model.exlib_recommender.lightgbm
+#############################
+"""
+
+import lightgbm as lgb
+from recbole.utils import ModelType, InputType
+
+
+class lightgbm(lgb.Booster):
+    r"""lightgbm is inherited from lgb.Booster
+
+    """
+    type = ModelType.DECISIONTREE
+    input_type = InputType.POINTWISE
+
+    def __init__(self, config, dataset):
+        super(lgb.Booster, self).__init__()
+
+    def to(self, device):
+        return self
diff --git a/recbole/model/exlib_recommender/xgboost.py b/recbole/model/exlib_recommender/xgboost.py
index a09da2cdb..44d3fabac 100644
--- a/recbole/model/exlib_recommender/xgboost.py
+++ b/recbole/model/exlib_recommender/xgboost.py
@@ -5,7 +5,7 @@
 
 r"""
 recbole.model.exlib_recommender.xgboost
-#############################
+########################################
 """
 
 import xgboost as xgb
@@ -16,7 +16,7 @@ class xgboost(xgb.Booster):
     r"""xgboost is inherited from xgb.Booster
 
     """
-    type = ModelType.XGBOOST
+    type = ModelType.DECISIONTREE
     input_type = InputType.POINTWISE
 
     def __init__(self, config, dataset):
diff --git a/recbole/model/general_recommender/__init__.py b/recbole/model/general_recommender/__init__.py
index 7f6de17cb..c9b96bb82 100644
--- a/recbole/model/general_recommender/__init__.py
+++ b/recbole/model/general_recommender/__init__.py
@@ -1,4 +1,5 @@
 from recbole.model.general_recommender.bpr import BPR
+from recbole.model.general_recommender.cdae import CDAE
 from recbole.model.general_recommender.convncf import ConvNCF
 from recbole.model.general_recommender.dgcf import DGCF
 from recbole.model.general_recommender.dmf import DMF
@@ -16,4 +17,5 @@
 from recbole.model.general_recommender.ngcf import NGCF
 from recbole.model.general_recommender.pop import Pop
 from recbole.model.general_recommender.spectralcf import SpectralCF
-from recbole.model.general_recommender.cdae import CDAE
\ No newline at end of file
+from recbole.model.general_recommender.ease import EASE
+from recbole.model.general_recommender.nncf import NNCF
diff --git a/recbole/model/general_recommender/bpr.py b/recbole/model/general_recommender/bpr.py
index 684929251..8886ab25d 100644
--- a/recbole/model/general_recommender/bpr.py
+++ b/recbole/model/general_recommender/bpr.py
@@ -8,7 +8,6 @@
 # @Author : Shanlei Mu
 # @Email  : slmu@ruc.edu.cn
 
-
 r"""
 BPR
 ################################################
diff --git a/recbole/model/general_recommender/cdae.py b/recbole/model/general_recommender/cdae.py
index 11eef117d..627b01abd 100644
--- a/recbole/model/general_recommender/cdae.py
+++ b/recbole/model/general_recommender/cdae.py
@@ -55,7 +55,7 @@ def __init__(self, config, dataset):
         if self.out_activation == 'sigmoid':
             self.o_act = nn.Sigmoid()
         elif self.out_activation == 'relu':
-            self.o_act = nn.Sigmoid()
+            self.o_act = nn.ReLU()
         else:
             raise ValueError('Invalid output layer activation function')
 
diff --git a/recbole/model/general_recommender/dgcf.py b/recbole/model/general_recommender/dgcf.py
index 28109b4c8..3560dd2b9 100644
--- a/recbole/model/general_recommender/dgcf.py
+++ b/recbole/model/general_recommender/dgcf.py
@@ -74,7 +74,7 @@ def __init__(self, config, dataset):
         self.n_layers = config['n_layers']
         self.reg_weight = config['reg_weight']
         self.cor_weight = config['cor_weight']
-        n_batch = dataset.dataset.inter_num // self.batch_size + 1
+        n_batch = dataset.dataset.inter_num // config['train_batch_size'] + 1
         self.cor_batch_size = int(max(self.n_users / n_batch, self.n_items / n_batch))
         # ensure embedding can be divided into <n_factors> intent
         assert self.embedding_size % self.n_factors == 0
@@ -207,8 +207,9 @@ def forward(self):
 
                     # get the attentive weights
                     # .... A_factor_values is a dense tensor with the size of [num_edge, 1]
-                    A_factor_values = torch.sum(head_factor_embeddings * torch.tanh(tail_factor_embeddings),
-                                                dim=1, keepdim=True)
+                    A_factor_values = torch.sum(
+                        head_factor_embeddings * torch.tanh(tail_factor_embeddings), dim=1, keepdim=True
+                    )
 
                     # update the attentive weights
                     A_iter_values.append(A_factor_values)
diff --git a/recbole/model/general_recommender/dmf.py b/recbole/model/general_recommender/dmf.py
index 07b5d88e1..7c0fbe57d 100644
--- a/recbole/model/general_recommender/dmf.py
+++ b/recbole/model/general_recommender/dmf.py
@@ -150,8 +150,8 @@ def get_user_embedding(self, user):
         """
         # Following lines construct tensor of shape [B,n_items] using the tensor of shape [B,H]
         col_indices = self.history_item_id[user].flatten()
-        row_indices = torch.arange(user.shape[0]).to(self.device).repeat_interleave(self.history_item_id.shape[1],
-                                                                                    dim=0)
+        row_indices = torch.arange(user.shape[0]).to(self.device)
+        row_indices = row_indices.repeat_interleave(self.history_item_id.shape[1], dim=0)
         matrix_01 = torch.zeros(1).to(self.device).repeat(user.shape[0], self.n_items)
         matrix_01.index_put_((row_indices, col_indices), self.history_item_value[user].flatten())
         user = self.user_linear(matrix_01)
diff --git a/recbole/model/general_recommender/ease.py b/recbole/model/general_recommender/ease.py
new file mode 100644
index 000000000..8429eb91d
--- /dev/null
+++ b/recbole/model/general_recommender/ease.py
@@ -0,0 +1,78 @@
+r"""
+EASE
+################################################
+Reference:
+    Steck. "Embarrassingly Shallow Autoencoders for Sparse Data" in WWW 2019.
+"""
+
+
+from recbole.utils.enum_type import ModelType
+import numpy as np
+import scipy.sparse as sp
+import torch
+
+from recbole.utils import InputType
+from recbole.model.abstract_recommender import GeneralRecommender
+
+
+class EASE(GeneralRecommender):
+    input_type = InputType.POINTWISE
+    type = ModelType.TRADITIONAL
+
+    def __init__(self, config, dataset):
+        super().__init__(config, dataset)
+
+        # need at least one param
+        self.dummy_param = torch.nn.Parameter(torch.zeros(1))
+
+        X = dataset.inter_matrix(
+            form='csr').astype(np.float32)
+
+        reg_weight = config['reg_weight']
+
+        # just directly calculate the entire score matrix in init
+        # (can't be done incrementally)
+
+        # gram matrix
+        G = X.T @ X
+
+        # add reg to diagonal
+        G += reg_weight * sp.identity(G.shape[0])
+
+        # convert to dense because inverse will be dense
+        G = G.todense()
+
+        # invert. this takes most of the time
+        P = np.linalg.inv(G)
+        B = P / (-np.diag(P))
+        # zero out diag
+        np.fill_diagonal(B, 0.)
+
+        # instead of computing and storing the entire score matrix, just store B and compute the scores on demand
+        # more memory efficient for a larger number of users
+        # but if there's a large number of items not much one can do:
+        # still have to compute B all at once
+        # S = X @ B
+        # self.score_matrix = torch.from_numpy(S).to(self.device)
+
+        # torch doesn't support sparse tensor slicing, so will do everything with np/scipy
+        self.item_similarity = B
+        self.interaction_matrix = X
+
+    def forward(self):
+        pass
+
+    def calculate_loss(self, interaction):
+        return torch.nn.Parameter(torch.zeros(1))
+
+    def predict(self, interaction):
+        user = interaction[self.USER_ID].cpu().numpy()
+        item = interaction[self.ITEM_ID].cpu().numpy()
+
+        return torch.from_numpy((self.interaction_matrix[user, :].multiply(self.item_similarity[:, item].T)).sum(axis=1).getA1())
+
+    def full_sort_predict(self, interaction):
+        user = interaction[self.USER_ID].cpu().numpy()
+
+        r = self.interaction_matrix[user, :] @ self.item_similarity
+        return torch.from_numpy(r.flatten())
diff --git a/recbole/model/general_recommender/fism.py b/recbole/model/general_recommender/fism.py
index 4f3457ea3..fdfecc216 100644
--- a/recbole/model/general_recommender/fism.py
+++ b/recbole/model/general_recommender/fism.py
@@ -3,7 +3,6 @@
 # @Author  :   Kaiyuan Li
 # @email   :   tsotfsk@outlook.com
 
-
 """
 FISM
 #######################################
@@ -49,6 +48,10 @@ def __init__(self, config, dataset):
         # split the too large dataset into the specified pieces
         if self.split_to > 0:
             self.group = torch.chunk(torch.arange(self.n_items).to(self.device), self.split_to)
+        else:
+            self.logger.warning('Pay Attetion!! the `split_to` is set to 0. If you catch a OMM error in this case, ' + \
+                                'you need to increase it \n\t\t\tuntil the error disappears. For example, ' + \
+                                'you can append it in the command line such as `--split_to=5`')
 
         # define layers and loss
         # construct source and destination item embedding matrix
@@ -172,8 +175,9 @@ def full_sort_predict(self, interaction):
             else:
                 output = []
                 for mask in self.group:
-                    tmp_output = self.user_forward(user_input[:item_num], item_num, user_bias,
-                                                   repeats=len(mask), pred_slc=mask)
+                    tmp_output = self.user_forward(
+                        user_input[:item_num], item_num, user_bias, repeats=len(mask), pred_slc=mask
+                    )
                     output.append(tmp_output)
                 output = torch.cat(output, dim=0)
             scores.append(output)
diff --git a/recbole/model/general_recommender/gcmc.py b/recbole/model/general_recommender/gcmc.py
index 133dbe53d..e3715493d 100644
--- a/recbole/model/general_recommender/gcmc.py
+++ b/recbole/model/general_recommender/gcmc.py
@@ -55,8 +55,7 @@ def __init__(self, config, dataset):
 
         # load dataset info
         self.num_all = self.n_users + self.n_items
-        self.interaction_matrix = dataset.inter_matrix(
-            form='coo').astype(np.float32)  # csr
+        self.interaction_matrix = dataset.inter_matrix(form='coo').astype(np.float32)  # csr
 
         # load parameters info
         self.dropout_prob = config['dropout_prob']
@@ -71,12 +70,14 @@ def __init__(self, config, dataset):
             features = self.get_sparse_eye_mat(self.num_all)
             i = features._indices()
             v = features._values()
-            self.user_features = torch.sparse.FloatTensor(i[:, :self.n_users], v[:self.n_users],
-                                                          torch.Size([self.n_users, self.num_all])).to(self.device)
+            self.user_features = torch.sparse.FloatTensor(
+                i[:, :self.n_users], v[:self.n_users], torch.Size([self.n_users, self.num_all])
+            ).to(self.device)
             item_i = i[:, self.n_users:]
             item_i[0, :] = item_i[0, :] - self.n_users
-            self.item_features = torch.sparse.FloatTensor(item_i, v[self.n_users:],
-                                                          torch.Size([self.n_items, self.num_all])).to(self.device)
+            self.item_features = torch.sparse.FloatTensor(
+                item_i, v[self.n_users:], torch.Size([self.n_items, self.num_all])
+            ).to(self.device)
         else:
             features = torch.eye(self.num_all).to(self.device)
             self.user_features, self.item_features = torch.split(features, [self.n_users, self.n_items])
@@ -91,26 +92,32 @@ def __init__(self, config, dataset):
         if self.accum == 'stack':
             div = self.gcn_output_dim // len(self.support)
             if self.gcn_output_dim % len(self.support) != 0:
-                self.logger.warning("HIDDEN[0] (=%d) of stack layer is adjusted to %d (in %d splits)."
-                                    % (self.gcn_output_dim, len(self.support) * div, len(self.support)))
+                self.logger.warning(
+                    "HIDDEN[0] (=%d) of stack layer is adjusted to %d (in %d splits)." %
+                    (self.gcn_output_dim, len(self.support) * div, len(self.support))
+                )
             self.gcn_output_dim = len(self.support) * div
 
         # define layers and loss
-        self.GcEncoder = GcEncoder(accum=self.accum,
-                                   num_user=self.n_users,
-                                   num_item=self.n_items,
-                                   support=self.support,
-                                   input_dim=self.input_dim,
-                                   gcn_output_dim=self.gcn_output_dim,
-                                   dense_output_dim=self.dense_output_dim,
-                                   drop_prob=self.dropout_prob,
-                                   device=self.device,
-                                   sparse_feature=self.sparse_feature).to(self.device)
-        self.BiDecoder = BiDecoder(input_dim=self.dense_output_dim,
-                                   output_dim=self.n_class,
-                                   drop_prob=0.,
-                                   device=self.device,
-                                   num_weights=self.num_basis_functions).to(self.device)
+        self.GcEncoder = GcEncoder(
+            accum=self.accum,
+            num_user=self.n_users,
+            num_item=self.n_items,
+            support=self.support,
+            input_dim=self.input_dim,
+            gcn_output_dim=self.gcn_output_dim,
+            dense_output_dim=self.dense_output_dim,
+            drop_prob=self.dropout_prob,
+            device=self.device,
+            sparse_feature=self.sparse_feature
+        ).to(self.device)
+        self.BiDecoder = BiDecoder(
+            input_dim=self.dense_output_dim,
+            output_dim=self.n_class,
+            drop_prob=0.,
+            device=self.device,
+            num_weights=self.num_basis_functions
+        ).to(self.device)
         self.loss_function = nn.CrossEntropyLoss()
 
     def get_sparse_eye_mat(self, num):
@@ -141,8 +148,7 @@ def get_norm_adj_mat(self):
             Sparse tensor of the normalized interaction matrix.
         """
         # build adj matrix
-        A = sp.dok_matrix((self.n_users + self.n_items,
-                           self.n_users + self.n_items), dtype=np.float32)
+        A = sp.dok_matrix((self.n_users + self.n_items, self.n_users + self.n_items), dtype=np.float32)
         inter_M = self.interaction_matrix
         inter_M_t = self.interaction_matrix.transpose()
         data_dict = dict(zip(zip(inter_M.row, inter_M.col + self.n_users), [1] * inter_M.nnz))
@@ -167,8 +173,7 @@ def get_norm_adj_mat(self):
     def forward(self, user_X, item_X, user, item):
         # Graph autoencoders are comprised of a graph encoder model and a pairwise decoder model.
         user_embedding, item_embedding = self.GcEncoder(user_X, item_X)
-        predict_score = self.BiDecoder(
-            user_embedding, item_embedding, user, item)
+        predict_score = self.BiDecoder(user_embedding, item_embedding, user, item)
         return predict_score
 
     def calculate_loss(self, interaction):
@@ -215,9 +220,22 @@ class GcEncoder(nn.Module):
     and :math:`E` the embedding size.
     """
 
-    def __init__(self, accum, num_user, num_item, support,
-                 input_dim, gcn_output_dim, dense_output_dim, drop_prob, device,
-                 sparse_feature=True, act_dense=lambda x: x, share_user_item_weights=True, bias=False):
+    def __init__(
+        self,
+        accum,
+        num_user,
+        num_item,
+        support,
+        input_dim,
+        gcn_output_dim,
+        dense_output_dim,
+        drop_prob,
+        device,
+        sparse_feature=True,
+        act_dense=lambda x: x,
+        share_user_item_weights=True,
+        bias=False
+    ):
         super(GcEncoder, self).__init__()
         self.num_users = num_user
         self.num_items = num_item
@@ -248,8 +266,7 @@ def __init__(self, accum, num_user, num_item, support,
             self.weights_u = nn.ParameterList([
                 nn.Parameter(
                     torch.FloatTensor(self.input_dim, self.gcn_output_dim).to(self.device), requires_grad=True
-                )
-                for _ in range(self.num_support)
+                ) for _ in range(self.num_support)
             ])
             if share_user_item_weights:
                 self.weights_v = self.weights_u
@@ -257,8 +274,7 @@ def __init__(self, accum, num_user, num_item, support,
                 self.weights_v = nn.ParameterList([
                     nn.Parameter(
                         torch.FloatTensor(self.input_dim, self.gcn_output_dim).to(self.device), requires_grad=True
-                    )
-                    for _ in range(self.num_support)
+                    ) for _ in range(self.num_support)
                 ])
         else:
             assert self.gcn_output_dim % self.num_support == 0, 'output_dim must be multiple of num_support for stackGC'
@@ -267,8 +283,7 @@ def __init__(self, accum, num_user, num_item, support,
             self.weights_u = nn.ParameterList([
                 nn.Parameter(
                     torch.FloatTensor(self.input_dim, self.sub_hidden_dim).to(self.device), requires_grad=True
-                )
-                for _ in range(self.num_support)
+                ) for _ in range(self.num_support)
             ])
             if share_user_item_weights:
                 self.weights_v = self.weights_u
@@ -276,8 +291,7 @@ def __init__(self, accum, num_user, num_item, support,
                 self.weights_v = nn.ParameterList([
                     nn.Parameter(
                         torch.FloatTensor(self.input_dim, self.sub_hidden_dim).to(self.device), requires_grad=True
-                    )
-                    for _ in range(self.num_support)
+                    ) for _ in range(self.num_support)
                 ])
 
         # dense layer
@@ -356,8 +370,7 @@ def forward(self, user_X, item_X):
 
             embeddings = torch.cat(embeddings, dim=1)
 
-        users, items = torch.split(
-            embeddings, [self.num_users, self.num_items])
+        users, items = torch.split(embeddings, [self.num_users, self.num_items])
 
         u_hidden = self.activate(users)
         v_hidden = self.activate(items)
@@ -381,8 +394,7 @@ class BiDecoder(nn.Module):
     BiDecoder takes pairs of node embeddings and predicts respective entries in the adjacency matrix.
     """
 
-    def __init__(self, input_dim, output_dim, drop_prob, device,
-                 num_weights=3, act=lambda x: x):
+    def __init__(self, input_dim, output_dim, drop_prob, device, num_weights=3, act=lambda x: x):
         super(BiDecoder, self).__init__()
         self.input_dim = input_dim
         self.output_dim = output_dim
@@ -394,8 +406,7 @@ def __init__(self, input_dim, output_dim, drop_prob, device,
         self.dropout = nn.Dropout(p=self.dropout_prob)
 
         self.weights = nn.ParameterList([
-            nn.Parameter(orthogonal([self.input_dim, self.input_dim]).to(self.device))
-            for _ in range(self.num_weights)
+            nn.Parameter(orthogonal([self.input_dim, self.input_dim]).to(self.device)) for _ in range(self.num_weights)
         ])
         self.dense_layer = nn.Linear(self.num_weights, self.output_dim, bias=False)
         self._init_weights()
diff --git a/recbole/model/general_recommender/itemknn.py b/recbole/model/general_recommender/itemknn.py
index 6a0c202df..37ace63c1 100644
--- a/recbole/model/general_recommender/itemknn.py
+++ b/recbole/model/general_recommender/itemknn.py
@@ -123,9 +123,7 @@ def compute_similarity(self, block_size=100):
 
         # End while on columns
 
-        W_sparse = sp.csr_matrix((values, (rows, cols)),
-                                 shape=(self.n_columns, self.n_columns),
-                                 dtype=np.float32)
+        W_sparse = sp.csr_matrix((values, (rows, cols)), shape=(self.n_columns, self.n_columns), dtype=np.float32)
         return W_sparse.tocsc()
 
 
diff --git a/recbole/model/general_recommender/lightgcn.py b/recbole/model/general_recommender/lightgcn.py
index 17affbb27..b57c9b1d2 100644
--- a/recbole/model/general_recommender/lightgcn.py
+++ b/recbole/model/general_recommender/lightgcn.py
@@ -81,8 +81,7 @@ def get_norm_adj_mat(self):
             Sparse tensor of the normalized interaction matrix.
         """
         # build adj matrix
-        A = sp.dok_matrix((self.n_users + self.n_items,
-                           self.n_users + self.n_items), dtype=np.float32)
+        A = sp.dok_matrix((self.n_users + self.n_items, self.n_users + self.n_items), dtype=np.float32)
         inter_M = self.interaction_matrix
         inter_M_t = self.interaction_matrix.transpose()
         data_dict = dict(zip(zip(inter_M.row, inter_M.col + self.n_users), [1] * inter_M.nnz))
diff --git a/recbole/model/general_recommender/line.py b/recbole/model/general_recommender/line.py
index 66ee1a919..7c183c2f4 100644
--- a/recbole/model/general_recommender/line.py
+++ b/recbole/model/general_recommender/line.py
@@ -25,6 +25,7 @@
 
 
 class NegSamplingLoss(nn.Module):
+
     def __init__(self):
         super(NegSamplingLoss, self).__init__()
 
@@ -68,8 +69,7 @@ def __init__(self, config, dataset):
 
     def get_used_ids(self):
         cur = np.array([set() for _ in range(self.n_items)])
-        for iid, uid in zip(self.interaction_feat[self.USER_ID].numpy(),
-                            self.interaction_feat[self.ITEM_ID].numpy()):
+        for iid, uid in zip(self.interaction_feat[self.USER_ID].numpy(), self.interaction_feat[self.ITEM_ID].numpy()):
             cur[iid].add(uid)
         return cur
 
@@ -83,9 +83,10 @@ def sampler(self, key_ids):
         key_ids = np.tile(key_ids, 1)
         while len(check_list) > 0:
             value_ids[check_list] = self.random_num(len(check_list))
-            check_list = np.array([i for i, used, v in
-                                   zip(check_list, self.used_ids[key_ids[check_list]], value_ids[check_list])
-                                   if v in used])
+            check_list = np.array([
+                i for i, used, v in zip(check_list, self.used_ids[key_ids[check_list]], value_ids[check_list])
+                if v in used
+            ])
 
         return torch.tensor(value_ids, device=self.device)
 
@@ -94,7 +95,7 @@ def random_num(self, num):
         self.random_pr %= self.random_list_length
         while True:
             if self.random_pr + num <= self.random_list_length:
-                value_id.append(self.random_list[self.random_pr: self.random_pr + num])
+                value_id.append(self.random_list[self.random_pr:self.random_pr + num])
                 self.random_pr += num
                 break
             else:
diff --git a/recbole/model/general_recommender/macridvae.py b/recbole/model/general_recommender/macridvae.py
index a761648ab..50ae69eb5 100644
--- a/recbole/model/general_recommender/macridvae.py
+++ b/recbole/model/general_recommender/macridvae.py
@@ -111,8 +111,7 @@ def forward(self, rating_matrix):
             cates_dist = OneHotCategorical(logits=cates_logits)
             cates_sample = cates_dist.sample()
             cates_mode = torch.softmax(cates_logits, dim=1)
-            cates = (self.training * cates_sample +
-                     (1 - self.training) * cates_mode)
+            cates = (self.training * cates_sample + (1 - self.training) * cates_mode)
 
         probs = None
         mulist = []
diff --git a/recbole/model/general_recommender/nais.py b/recbole/model/general_recommender/nais.py
index 1eba6a972..c59cf594b 100644
--- a/recbole/model/general_recommender/nais.py
+++ b/recbole/model/general_recommender/nais.py
@@ -63,6 +63,10 @@ def __init__(self, config, dataset):
         if self.split_to > 0:
             self.logger.info('split the n_items to {} pieces'.format(self.split_to))
             self.group = torch.chunk(torch.arange(self.n_items).to(self.device), self.split_to)
+        else:
+            self.logger.warning('Pay Attetion!! the `split_to` is set to 0. If you catch a OMM error in this case, ' + \
+                                'you need to increase it \n\t\t\tuntil the error disappears. For example, ' + \
+                                'you can append it in the command line such as `--split_to=5`')
 
         # define layers and loss
         # construct source and destination item embedding matrix
@@ -161,7 +165,8 @@ def attention_mlp(self, inter, target):
         if self.algorithm == 'prod':
             mlp_input = inter * target.unsqueeze(1)  # batch_size x max_len x embedding_size
         else:
-            mlp_input = torch.cat([inter, target.unsqueeze(1).expand_as(inter)], dim=2)  # batch_size x max_len x embedding_size*2
+            mlp_input = torch.cat([inter, target.unsqueeze(1).expand_as(inter)],
+                                  dim=2)  # batch_size x max_len x embedding_size*2
         mlp_output = self.mlp_layers(mlp_input)  # batch_size x max_len x weight_size
 
         logits = torch.matmul(mlp_output, self.weight_layer).squeeze(2)  # batch_size x max_len
diff --git a/recbole/model/general_recommender/neumf.py b/recbole/model/general_recommender/neumf.py
index 8784c4cd7..33d11d46f 100644
--- a/recbole/model/general_recommender/neumf.py
+++ b/recbole/model/general_recommender/neumf.py
@@ -90,8 +90,7 @@ def load_pretrain(self):
                 m1.weight.data.copy_(m2.weight)
                 m1.bias.data.copy_(m2.bias)
 
-        predict_weight = torch.cat([mf.predict_layer.weight,
-                                    mlp.predict_layer.weight], dim=1)
+        predict_weight = torch.cat([mf.predict_layer.weight, mlp.predict_layer.weight], dim=1)
         predict_bias = mf.predict_layer.bias + mlp.predict_layer.bias
 
         self.predict_layer.weight.data.copy_(0.5 * predict_weight)
diff --git a/recbole/model/general_recommender/ngcf.py b/recbole/model/general_recommender/ngcf.py
index d02ccd499..8352aec53 100644
--- a/recbole/model/general_recommender/ngcf.py
+++ b/recbole/model/general_recommender/ngcf.py
@@ -89,10 +89,8 @@ def get_norm_adj_mat(self):
         A = sp.dok_matrix((self.n_users + self.n_items, self.n_users + self.n_items), dtype=np.float32)
         inter_M = self.interaction_matrix
         inter_M_t = self.interaction_matrix.transpose()
-        data_dict = dict(zip(zip(inter_M.row, inter_M.col + self.n_users),
-                             [1] * inter_M.nnz))
-        data_dict.update(dict(zip(zip(inter_M_t.row + self.n_users, inter_M_t.col),
-                                  [1] * inter_M_t.nnz)))
+        data_dict = dict(zip(zip(inter_M.row, inter_M.col + self.n_users), [1] * inter_M.nnz))
+        data_dict.update(dict(zip(zip(inter_M_t.row + self.n_users, inter_M_t.col), [1] * inter_M_t.nnz)))
         A._update(data_dict)
         # norm adj matrix
         sumArr = (A > 0).sum(axis=1)
diff --git a/recbole/model/general_recommender/nncf.py b/recbole/model/general_recommender/nncf.py
new file mode 100644
index 000000000..79692eada
--- /dev/null
+++ b/recbole/model/general_recommender/nncf.py
@@ -0,0 +1,398 @@
+# -*- coding: utf-8 -*-
+# @Time   : 2021/1/14
+# @Author : Chengyuan Li
+# @Email  : 2017202049@ruc.edu.cn
+
+r"""
+NNCF
+################################################
+Reference:
+    Ting Bai et al. "A Neural Collaborative Filtering Model with Interaction-based Neighborhood." in CIKM 2017.
+
+Reference code:
+    https://github.com/Tbbaby/NNCF-Pytorch
+
+"""
+
+import torch
+import torch.nn as nn
+from torch.nn.init import normal_
+
+from recbole.model.abstract_recommender import GeneralRecommender
+from recbole.model.layers import MLPLayers
+from recbole.utils import InputType
+
+import numpy as np
+from sklearn.metrics import jaccard_score
+
+
+class NNCF(GeneralRecommender):
+    r"""NNCF is an neural network enhanced matrix factorization model which also captures neighborhood information.
+    We implement the NNCF model with three ways to process neighborhood information.
+    """
+    input_type = InputType.POINTWISE
+
+    def __init__(self, config, dataset):
+        super(NNCF, self).__init__(config, dataset)
+
+        # load dataset info
+        self.LABEL = config['LABEL_FIELD']
+        self.interaction_matrix = dataset.inter_matrix(form='coo').astype(
+            np.float32)
+
+        # load parameters info
+        self.ui_embedding_size = config['ui_embedding_size']
+        self.neigh_embedding_size = config['neigh_embedding_size']
+        self.num_conv_kernel = config['num_conv_kernel']
+        self.conv_kernel_size = config['conv_kernel_size']
+        self.pool_kernel_size = config['pool_kernel_size']
+        self.mlp_hidden_size = config['mlp_hidden_size']
+        self.neigh_num = config['neigh_num']
+        self.neigh_info_method = config['neigh_info_method']
+        self.resolution = config['resolution']
+
+        # define layers and loss
+        self.user_embedding = nn.Embedding(self.n_users, self.ui_embedding_size)
+        self.item_embedding = nn.Embedding(self.n_items, self.ui_embedding_size)
+        self.user_neigh_embedding = nn.Embedding(self.n_items, self.neigh_embedding_size)
+        self.item_neigh_embedding = nn.Embedding(self.n_users, self.neigh_embedding_size)
+        self.user_conv = nn.Sequential(
+            nn.Conv1d(self.neigh_embedding_size, self.num_conv_kernel, self.conv_kernel_size),
+            nn.MaxPool1d(self.pool_kernel_size), 
+            nn.ReLU())
+        self.item_conv = nn.Sequential(
+            nn.Conv1d(self.neigh_embedding_size, self.num_conv_kernel, self.conv_kernel_size),
+            nn.MaxPool1d(self.pool_kernel_size), 
+            nn.ReLU())
+        conved_size = self.neigh_num - (self.conv_kernel_size - 1)
+        pooled_size = (conved_size - (self.pool_kernel_size - 1) - 1) // self.pool_kernel_size + 1
+        self.mlp_layers = MLPLayers([2 * pooled_size * self.num_conv_kernel + self.ui_embedding_size] + self.mlp_hidden_size, config['dropout'])
+        self.out_layer = nn.Sequential(nn.Linear(self.mlp_hidden_size[-1], 1),
+                                       nn.Sigmoid())
+        self.dropout_layer = torch.nn.Dropout(p=config['dropout'])
+        self.loss = nn.BCELoss()
+
+        # choose the method to use neighborhood information
+        if self.neigh_info_method == "random":
+            self.u_neigh, self.i_neigh = self.get_neigh_random()
+        elif self.neigh_info_method == "knn":
+            self.u_neigh, self.i_neigh = self.get_neigh_knn()
+        elif self.neigh_info_method == "louvain":
+            self.u_neigh, self.i_neigh = self.get_neigh_louvain()
+        else:
+            raise RuntimeError('You need to choose the right algorithm of processing neighborhood information. \
+                The parameter neigh_info_method can be set to random, knn or louvain.')
+
+        # parameters initialization
+        self.apply(self._init_weights)
+
+    def _init_weights(self, module):
+        if isinstance(module, nn.Embedding):
+            normal_(module.weight.data, mean=0.0, std=0.01)
+
+    # Unify embedding length
+    def Max_ner(self, lst, max_ner):
+        r"""Unify embedding length of neighborhood information for efficiency consideration. 
+        Truncate the list if the length is larger than max_ner.
+        Otherwise, pad it with 0. 
+
+        Args:
+            lst (list): The input list contains node's neighbors.
+            max_ner (int): The number of neighbors we choose for each node.
+
+        Returns:
+            list: The list of a node's community neighbors.
+
+
+        """
+        for i in range(len(lst)):
+            if len(lst[i]) >= max_ner:
+                lst[i] = lst[i][:max_ner]
+            else:
+                length = len(lst[i])
+                for _ in range(max_ner - length):
+                    lst[i].append(0)
+        return lst
+
+    # Find other nodes in the same community
+    def get_community_member(self, partition, community_dict, node, kind):
+        r"""Find other nodes in the same community. 
+        e.g. If the node starts with letter "i", 
+        the other nodes start with letter "i" in the same community dict group are its community neighbors.
+
+        Args:
+            partition (dict): The input dict that contains the community each node belongs.
+            community_dict (dict): The input dict that shows the nodes each community contains.
+            node (int): The id of the input node.
+            kind (char): The type of the input node.
+
+        Returns:
+            list: The list of a node's community neighbors.
+
+        """
+        comm = community_dict[partition[node]]
+        return [x for x in comm if x.startswith(kind)]
+
+    # Prepare neiborhood embeddings, i.e. I(u) and U(i)
+    def prepare_vector_element(self, partition, relation, community_dict):
+        r"""Find the community neighbors of each node, i.e. I(u) and U(i).
+        Then reset the id of nodes.
+
+        Args:
+            partition (dict): The input dict that contains the community each node belongs.
+            relation (list): The input list that contains the relationships of users and items.
+            community_dict (dict): The input dict that shows the nodes each community contains.
+
+        Returns:
+            list: The list of nodes' community neighbors.
+
+        """
+        item2user_neighbor_lst = [[] for _ in range(self.n_items)]  
+        user2item_neighbor_lst = [[] for _ in range(self.n_users)]  
+
+        for r in range(len(relation)):
+            user, item = relation[r][0], relation[r][1]
+            item2user_neighbor = self.get_community_member(partition, community_dict, user, 'u')
+            np.random.shuffle(item2user_neighbor)
+            user2item_neighbor = self.get_community_member(partition, community_dict, item, 'i')
+            np.random.shuffle(user2item_neighbor)
+            _, user = user.split('_', 1)
+            user = int(user)
+            _, item = item.split('_', 1)
+            item = int(item)
+            for i in range(len(item2user_neighbor)):
+                name, index = item2user_neighbor[i].split('_', 1)
+                item2user_neighbor[i] = int(index)
+            for i in range(len(user2item_neighbor)):
+                name, index = user2item_neighbor[i].split('_', 1)
+                user2item_neighbor[i] = int(index)
+
+            item2user_neighbor_lst[item] = item2user_neighbor
+            user2item_neighbor_lst[user] = user2item_neighbor
+
+        return user2item_neighbor_lst, item2user_neighbor_lst
+
+    # Get neighborhood embeddings using louvain method
+    def get_neigh_louvain(self):
+        r"""Get neighborhood information using louvain algorithm.
+        First, change the id of node, 
+        for example, the id of user node "1" will be set to "u_1" in order to use louvain algorithm.
+        Second, use louvain algorithm to seperate nodes into different communities.
+        Finally, find the community neighbors of each node with the same type and reset the id of the nodes.
+
+        Returns:
+            torch.IntTensor: The neighborhood nodes of a batch of user or item, shape: [batch_size, neigh_num]
+        """
+        inter_M = self.interaction_matrix
+        pairs = list(zip(inter_M.row, inter_M.col))
+
+        tmp_relation = []
+        for i in range(len(pairs)):
+            tmp_relation.append(['user_' + str(pairs[i][0]), 'item_' + str(pairs[i][1])])
+        
+        import networkx as nx
+        G = nx.Graph()
+        G.add_edges_from(tmp_relation)
+        resolution = self.resolution
+        import community
+        partition = community.best_partition(G, resolution=resolution)
+
+        community_dict = {}
+        community_dict.setdefault(0, [])
+        for i in range(len(partition.values())):
+            community_dict[i] = []
+        for node, part in partition.items():
+            community_dict[part] = community_dict[part] + [node]
+
+        tmp_user2item, tmp_item2user = self.prepare_vector_element(partition, tmp_relation, community_dict)
+        u_neigh = self.Max_ner(tmp_user2item, self.neigh_num)
+        i_neigh = self.Max_ner(tmp_item2user, self.neigh_num)
+
+        u_neigh = torch.tensor(u_neigh, device=self.device)
+        i_neigh = torch.tensor(i_neigh, device=self.device)
+        return u_neigh, i_neigh
+
+    # Count the similarity of node and direct neighbors using jaccard method
+    def count_jaccard(self, inters, node, neigh_list, kind):
+        r""" Count the similarity of the node and its direct neighbors using jaccard similarity.
+
+        Args:
+            inters (list): The input list that contains the relationships of users and items.
+            node (int): The id of the input node.
+            neigh_list (list): The input list that contains the neighbors of the input node.
+            kind (char): The type of the input node.
+
+        Returns:
+            list: The list of jaccard similarity score between the node and its neighbors.
+
+        """
+        if kind == 'u':
+            if node in neigh_list:
+                return 0
+            vec_node = inters[:, node]
+            score = 0
+            for neigh in neigh_list:
+                vec_neigh = inters[:, neigh]
+                tmp = jaccard_score(vec_node, vec_neigh)
+                score += tmp
+            return score
+        else:
+            if node in neigh_list:
+                return 0
+            vec_node = inters[node]
+            score = 0
+            for neigh in neigh_list:
+                vec_neigh = inters[neigh]
+                tmp = jaccard_score(vec_node, vec_neigh)
+                score += tmp
+            return score
+
+    # Get neighborhood embeddings using knn method
+    def get_neigh_knn(self):
+        r"""Get neighborhood information using knn algorithm.
+        Find direct neighbors of each node, if the number of direct neighbors is less than neigh_num, 
+        add other similar neighbors using jaccard similarity.
+        Otherwise, select random top k direct neighbors, k equals to the number of neighbors. 
+
+        Returns:
+            torch.IntTensor: The neighborhood nodes of a batch of user or item, shape: [batch_size, neigh_num]
+        """
+        inter_M = self.interaction_matrix
+        pairs = list(zip(inter_M.row, inter_M.col))
+        ui_inters = np.zeros((self.n_users, self.n_items), dtype=np.int8)
+
+        for i in range(len(pairs)):
+            ui_inters[pairs[i][0], pairs[i][1]] = 1
+
+        u_neigh, i_neigh = [], []
+
+        for u in range(self.n_users):
+            neigh_list = ui_inters[u].nonzero()[0]
+            direct_neigh_num = len(neigh_list)
+            if len(neigh_list) == 0:
+                u_neigh.append(self.neigh_num * [0])
+            elif direct_neigh_num < self.neigh_num:
+                knn_neigh_dict = {}
+                for i in range(self.n_items):
+                    score = self.count_jaccard(ui_inters, i, neigh_list, 'u')
+                    knn_neigh_dict[i] = score
+                knn_neigh_dict_sorted = dict(sorted(knn_neigh_dict.items(), key=lambda item:item[1], reverse=True))
+                knn_neigh_list = knn_neigh_dict_sorted.keys()
+                neigh_list = list(neigh_list) + list(knn_neigh_list)
+                u_neigh.append(neigh_list[:self.neigh_num])
+            else:
+                mask = np.random.randint(0, len(neigh_list), size=self.neigh_num)
+                u_neigh.append(neigh_list[mask])
+
+        for i in range(self.n_items):
+            neigh_list = ui_inters[:, i].nonzero()[0]
+            direct_neigh_num = len(neigh_list)
+            if len(neigh_list) == 0:
+                i_neigh.append(self.neigh_num * [0])
+            elif direct_neigh_num < self.neigh_num:
+                knn_neigh_dict = {}
+                for i in range(self.n_users):
+                    score = self.count_jaccard(ui_inters, i, neigh_list, 'i')
+                    knn_neigh_dict[i] = score
+                knn_neigh_dict_sorted = dict(sorted(knn_neigh_dict.items(), key=lambda item:item[1], reverse=True))
+                knn_neigh_list = knn_neigh_dict_sorted.keys()
+                neigh_list = list(neigh_list) + list(knn_neigh_list)
+                i_neigh.append(neigh_list[:self.neigh_num])
+            else:
+                mask = np.random.randint(0, len(neigh_list), size=self.neigh_num)
+                i_neigh.append(neigh_list[mask])
+
+        u_neigh = torch.tensor(u_neigh, device=self.device)
+        i_neigh = torch.tensor(i_neigh, device=self.device)
+        return u_neigh, i_neigh
+
+    # Get neighborhood embeddings using random method
+    def get_neigh_random(self):
+        r"""Get neighborhood information using random algorithm.
+        Select random top k direct neighbors, k equals to the number of neighbors. 
+        
+        Returns:
+            torch.IntTensor: The neighborhood nodes of a batch of user or item, shape: [batch_size, neigh_num]
+        """
+        inter_M = self.interaction_matrix
+        pairs = list(zip(inter_M.row, inter_M.col))
+        ui_inters = np.zeros((self.n_users, self.n_items), dtype=np.int8)
+
+        for i in range(len(pairs)):
+            ui_inters[pairs[i][0], pairs[i][1]] = 1
+
+        u_neigh, i_neigh = [], []
+
+        for u in range(self.n_users):
+            neigh_list = ui_inters[u].nonzero()[0]
+            if len(neigh_list) == 0:
+                u_neigh.append(self.neigh_num * [0])
+            else:
+                mask = np.random.randint(0, len(neigh_list), size=self.neigh_num)
+                u_neigh.append(neigh_list[mask])
+
+        for i in range(self.n_items):
+            neigh_list = ui_inters[:, i].nonzero()[0]
+            if len(neigh_list) == 0:
+                i_neigh.append(self.neigh_num * [0])
+            else:
+                mask = np.random.randint(0, len(neigh_list), size=self.neigh_num)
+                i_neigh.append(neigh_list[mask])
+
+        u_neigh = torch.tensor(u_neigh, device=self.device)
+        i_neigh = torch.tensor(i_neigh, device=self.device)
+        return u_neigh, i_neigh
+
+    # Get neighborhood embeddings
+    def get_neigh_info(self, user, item):
+        r"""Get a batch of neighborhood embedding tensor according to input id.
+
+        Args:
+            user (torch.LongTensor): The input tensor that contains user's id, shape: [batch_size, ]
+            item (torch.LongTensor): The input tensor that contains item's id, shape: [batch_size, ]
+
+        Returns:
+            torch.FloatTensor: The neighborhood embedding tensor of a batch of user, shape: [batch_size, neigh_embedding_size]
+            torch.FloatTensor: The neighborhood embedding tensor of a batch of item, shape: [batch_size, neigh_embedding_size]
+
+        """
+        batch_u_neigh = self.u_neigh[user]
+        batch_i_neigh = self.i_neigh[item]
+        return batch_u_neigh, batch_i_neigh
+
+    def forward(self, user, item):
+        user_embedding = self.user_embedding(user)
+        item_embedding = self.item_embedding(item)
+
+        user_neigh_input, item_neigh_input = self.get_neigh_info(user, item)
+        user_neigh_embedding = self.user_neigh_embedding(user_neigh_input)
+        item_neigh_embedding = self.item_neigh_embedding(item_neigh_input)
+        user_neigh_embedding = user_neigh_embedding.permute(0, 2, 1)
+        user_neigh_conv_embedding = self.user_conv(user_neigh_embedding)
+        # batch_size * out_channel * pool_size
+        batch_size = user_neigh_conv_embedding.size(0)
+        user_neigh_conv_embedding = user_neigh_conv_embedding.view(batch_size, -1)
+        item_neigh_embedding = item_neigh_embedding.permute(0, 2, 1)
+        item_neigh_conv_embedding = self.item_conv(item_neigh_embedding)
+        # batch_size * out_channel * pool_size
+        item_neigh_conv_embedding = item_neigh_conv_embedding.view(batch_size, -1)
+        mf_vec = torch.mul(user_embedding, item_embedding)
+        last = torch.cat((mf_vec, user_neigh_conv_embedding, item_neigh_conv_embedding), dim=-1)
+
+        output = self.mlp_layers(last)
+        out = self.out_layer(output)
+        out = out.squeeze(-1)
+        return out
+
+    def calculate_loss(self, interaction):
+        user = interaction[self.USER_ID]
+        item = interaction[self.ITEM_ID]
+        label = interaction[self.LABEL]
+
+        output = self.forward(user, item)
+        return self.loss(output, label)
+
+    def predict(self, interaction):
+        user = interaction[self.USER_ID]
+        item = interaction[self.ITEM_ID]
+        return self.forward(user, item)
diff --git a/recbole/model/general_recommender/pop.py b/recbole/model/general_recommender/pop.py
index 9a7e75c0a..6a0cc261e 100644
--- a/recbole/model/general_recommender/pop.py
+++ b/recbole/model/general_recommender/pop.py
@@ -6,6 +6,7 @@
 # @Time   : 2020/11/9
 # @Author : Zihan Lin
 # @Email  : zhlin@ruc.edu.cn
+
 r"""
 Pop
 ################################################
diff --git a/recbole/model/general_recommender/spectralcf.py b/recbole/model/general_recommender/spectralcf.py
index 537b452b5..83e17e076 100644
--- a/recbole/model/general_recommender/spectralcf.py
+++ b/recbole/model/general_recommender/spectralcf.py
@@ -58,22 +58,21 @@ def __init__(self, config, dataset):
 
         # generate intermediate data
         # "A_hat = I + L" is equivalent to "A_hat = U U^T + U \Lambda U^T"
-        self.interaction_matrix = dataset.inter_matrix(
-            form='coo').astype(np.float32)
+        self.interaction_matrix = dataset.inter_matrix(form='coo').astype(np.float32)
         I = self.get_eye_mat(self.n_items + self.n_users)
         L = self.get_laplacian_matrix()
         A_hat = I + L
         self.A_hat = A_hat.to(self.device)
 
         # define layers and loss
-        self.user_embedding = torch.nn.Embedding(
-            num_embeddings=self.n_users, embedding_dim=self.emb_dim)
-        self.item_embedding = torch.nn.Embedding(
-            num_embeddings=self.n_items, embedding_dim=self.emb_dim)
-        self.filters = torch.nn.ParameterList(
-            [torch.nn.Parameter(torch.normal(mean=0.01, std=0.02, size=(self.emb_dim, self.emb_dim)).to(self.device),
-                                requires_grad=True)
-             for _ in range(self.n_layers)])
+        self.user_embedding = torch.nn.Embedding(num_embeddings=self.n_users, embedding_dim=self.emb_dim)
+        self.item_embedding = torch.nn.Embedding(num_embeddings=self.n_items, embedding_dim=self.emb_dim)
+        self.filters = torch.nn.ParameterList([
+            torch.nn.Parameter(
+                torch.normal(mean=0.01, std=0.02, size=(self.emb_dim, self.emb_dim)).to(self.device),
+                requires_grad=True
+            ) for _ in range(self.n_layers)
+        ])
 
         self.sigmoid = torch.nn.Sigmoid()
         self.mf_loss = BPRLoss()
@@ -94,8 +93,7 @@ def get_laplacian_matrix(self):
             Sparse tensor of the laplacian matrix.
         """
         # build adj matrix
-        A = sp.dok_matrix((self.n_users + self.n_items,
-                           self.n_users + self.n_items), dtype=np.float32)
+        A = sp.dok_matrix((self.n_users + self.n_items, self.n_users + self.n_items), dtype=np.float32)
         inter_M = self.interaction_matrix
         inter_M_t = self.interaction_matrix.transpose()
         data_dict = dict(zip(zip(inter_M.row, inter_M.col + self.n_users), [1] * inter_M.nnz))
@@ -151,13 +149,11 @@ def forward(self):
 
         for k in range(self.n_layers):
             all_embeddings = torch.sparse.mm(self.A_hat, all_embeddings)
-            all_embeddings = self.sigmoid(
-                torch.mm(all_embeddings, self.filters[k]))
+            all_embeddings = self.sigmoid(torch.mm(all_embeddings, self.filters[k]))
             embeddings_list.append(all_embeddings)
 
         new_embeddings = torch.cat(embeddings_list, dim=1)
-        user_all_embeddings, item_all_embeddings = torch.split(
-            new_embeddings, [self.n_users, self.n_items])
+        user_all_embeddings, item_all_embeddings = torch.split(new_embeddings, [self.n_users, self.n_items])
         return user_all_embeddings, item_all_embeddings
 
     def calculate_loss(self, interaction):
@@ -198,6 +194,5 @@ def full_sort_predict(self, interaction):
             self.restore_user_e, self.restore_item_e = self.forward()
         u_embeddings = self.restore_user_e[user]
 
-        scores = torch.matmul(
-            u_embeddings, self.restore_item_e.transpose(0, 1))
+        scores = torch.matmul(u_embeddings, self.restore_item_e.transpose(0, 1))
         return scores.view(-1)
diff --git a/recbole/model/knowledge_aware_recommender/cfkg.py b/recbole/model/knowledge_aware_recommender/cfkg.py
index 01261f09a..dc21aa935 100644
--- a/recbole/model/knowledge_aware_recommender/cfkg.py
+++ b/recbole/model/knowledge_aware_recommender/cfkg.py
@@ -83,7 +83,7 @@ def _get_kg_embedding(self, head, pos_tail, neg_tail, relation):
 
     def _get_score(self, h_e, t_e, r_e):
         if self.loss_function == 'transe':
-            return - torch.norm(h_e + r_e - t_e, p=2, dim=1)
+            return -torch.norm(h_e + r_e - t_e, p=2, dim=1)
         else:
             return torch.mul(h_e + r_e, t_e).sum(dim=1)
 
@@ -124,4 +124,4 @@ def __init__(self):
     def forward(self, anchor, positive, negative):
         pos_score = torch.mul(anchor, positive).sum(dim=1)
         neg_score = torch.mul(anchor, negative).sum(dim=1)
-        return (F.softplus(- pos_score) + F.softplus(neg_score)).mean()
+        return (F.softplus(-pos_score) + F.softplus(neg_score)).mean()
diff --git a/recbole/model/knowledge_aware_recommender/kgcn.py b/recbole/model/knowledge_aware_recommender/kgcn.py
index 5b2d7b655..b8e76b025 100644
--- a/recbole/model/knowledge_aware_recommender/kgcn.py
+++ b/recbole/model/knowledge_aware_recommender/kgcn.py
@@ -46,24 +46,24 @@ def __init__(self, config, dataset):
 
         # define embedding
         self.user_embedding = nn.Embedding(self.n_users, self.embedding_size)
-        self.entity_embedding = nn.Embedding(
-            self.n_entities, self.embedding_size)
-        self.relation_embedding = nn.Embedding(
-            self.n_relations + 1, self.embedding_size)
+        self.entity_embedding = nn.Embedding(self.n_entities, self.embedding_size)
+        self.relation_embedding = nn.Embedding(self.n_relations + 1, self.embedding_size)
 
         # sample neighbors
         kg_graph = dataset.kg_graph(form='coo', value_field='relation_id')
         adj_entity, adj_relation = self.construct_adj(kg_graph)
-        self.adj_entity, self.adj_relation = adj_entity.to(
-            self.device), adj_relation.to(self.device)
+        self.adj_entity, self.adj_relation = adj_entity.to(self.device), adj_relation.to(self.device)
 
         # define function
         self.softmax = nn.Softmax(dim=-1)
         self.linear_layers = torch.nn.ModuleList()
         for i in range(self.n_iter):
-            self.linear_layers.append(nn.Linear(
-                self.embedding_size if not self.aggregator_class == 'concat' else self.embedding_size * 2,
-                self.embedding_size))
+            self.linear_layers.append(
+                nn.Linear(
+                    self.embedding_size if not self.aggregator_class == 'concat' else self.embedding_size * 2,
+                    self.embedding_size
+                )
+            )
         self.ReLU = nn.ReLU()
         self.Tanh = nn.Tanh()
 
@@ -104,30 +104,26 @@ def construct_adj(self, kg_graph):
         # each line of adj_entity stores the sampled neighbor entities for a given entity
         # each line of adj_relation stores the corresponding sampled neighbor relations
         entity_num = kg_graph.shape[0]
-        adj_entity = np.zeros(
-            [entity_num, self.neighbor_sample_size], dtype=np.int64)
-        adj_relation = np.zeros(
-            [entity_num, self.neighbor_sample_size], dtype=np.int64)
+        adj_entity = np.zeros([entity_num, self.neighbor_sample_size], dtype=np.int64)
+        adj_relation = np.zeros([entity_num, self.neighbor_sample_size], dtype=np.int64)
         for entity in range(entity_num):
             if entity not in kg_dict.keys():
-                adj_entity[entity] = np.array(
-                    [entity] * self.neighbor_sample_size)
-                adj_relation[entity] = np.array(
-                    [0] * self.neighbor_sample_size)
+                adj_entity[entity] = np.array([entity] * self.neighbor_sample_size)
+                adj_relation[entity] = np.array([0] * self.neighbor_sample_size)
                 continue
 
             neighbors = kg_dict[entity]
             n_neighbors = len(neighbors)
             if n_neighbors >= self.neighbor_sample_size:
-                sampled_indices = np.random.choice(list(range(n_neighbors)), size=self.neighbor_sample_size,
-                                                   replace=False)
+                sampled_indices = np.random.choice(
+                    list(range(n_neighbors)), size=self.neighbor_sample_size, replace=False
+                )
             else:
-                sampled_indices = np.random.choice(list(range(n_neighbors)), size=self.neighbor_sample_size,
-                                                   replace=True)
-            adj_entity[entity] = np.array(
-                [neighbors[i][0] for i in sampled_indices])
-            adj_relation[entity] = np.array(
-                [neighbors[i][1] for i in sampled_indices])
+                sampled_indices = np.random.choice(
+                    list(range(n_neighbors)), size=self.neighbor_sample_size, replace=True
+                )
+            adj_entity[entity] = np.array([neighbors[i][0] for i in sampled_indices])
+            adj_relation[entity] = np.array([neighbors[i][1] for i in sampled_indices])
 
         return torch.from_numpy(adj_entity), torch.from_numpy(adj_relation)
 
@@ -176,16 +172,20 @@ def mix_neighbor_vectors(self, neighbor_vectors, neighbor_relations, user_embedd
         """
         avg = False
         if not avg:
-            user_embeddings = user_embeddings.reshape(self.batch_size, 1, 1,
-                                                      self.embedding_size)  # [batch_size, 1, 1, dim]
-            user_relation_scores = torch.mean(user_embeddings * neighbor_relations,
-                                              dim=-1)  # [batch_size, -1, n_neighbor]
+            user_embeddings = user_embeddings.reshape(
+                self.batch_size, 1, 1, self.embedding_size
+            )  # [batch_size, 1, 1, dim]
+            user_relation_scores = torch.mean(
+                user_embeddings * neighbor_relations, dim=-1
+            )  # [batch_size, -1, n_neighbor]
             user_relation_scores_normalized = self.softmax(user_relation_scores)  # [batch_size, -1, n_neighbor]
 
-            user_relation_scores_normalized = torch.unsqueeze(user_relation_scores_normalized,
-                                                              dim=-1)  # [batch_size, -1, n_neighbor, 1]
-            neighbors_aggregated = torch.mean(user_relation_scores_normalized * neighbor_vectors,
-                                              dim=2)  # [batch_size, -1, dim]
+            user_relation_scores_normalized = torch.unsqueeze(
+                user_relation_scores_normalized, dim=-1
+            )  # [batch_size, -1, n_neighbor, 1]
+            neighbors_aggregated = torch.mean(
+                user_relation_scores_normalized * neighbor_vectors, dim=2
+            )  # [batch_size, -1, dim]
         else:
             neighbors_aggregated = torch.mean(neighbor_vectors, dim=2)  # [batch_size, -1, dim]
         return neighbors_aggregated
@@ -214,14 +214,14 @@ def aggregate(self, user_embeddings, entities, relations):
         for i in range(self.n_iter):
             entity_vectors_next_iter = []
             for hop in range(self.n_iter - i):
-                shape = (self.batch_size, -1,
-                         self.neighbor_sample_size, self.embedding_size)
+                shape = (self.batch_size, -1, self.neighbor_sample_size, self.embedding_size)
                 self_vectors = entity_vectors[hop]
                 neighbor_vectors = entity_vectors[hop + 1].reshape(shape)
                 neighbor_relations = relation_vectors[hop].reshape(shape)
 
-                neighbors_agg = self.mix_neighbor_vectors(neighbor_vectors, neighbor_relations,
-                                                          user_embeddings)  # [batch_size, -1, dim]
+                neighbors_agg = self.mix_neighbor_vectors(
+                    neighbor_vectors, neighbor_relations, user_embeddings
+                )  # [batch_size, -1, dim]
 
                 if self.aggregator_class == 'sum':
                     output = (self_vectors + neighbors_agg).reshape(-1, self.embedding_size)  # [-1, dim]
@@ -232,8 +232,7 @@ def aggregate(self, user_embeddings, entities, relations):
                     output = torch.cat([self_vectors, neighbors_agg], dim=-1)
                     output = output.reshape(-1, self.embedding_size * 2)  # [-1, dim * 2]
                 else:
-                    raise Exception("Unknown aggregator: " +
-                                    self.aggregator_class)
+                    raise Exception("Unknown aggregator: " + self.aggregator_class)
 
                 output = self.linear_layers[i](output)
                 # [batch_size, -1, dim]
@@ -275,8 +274,7 @@ def calculate_loss(self, interaction):
         neg_item_score = torch.mul(user_e, neg_item_e).sum(dim=1)
 
         predict = torch.cat((pos_item_score, neg_item_score))
-        target = torch.zeros(
-            len(user) * 2, dtype=torch.float32).to(self.device)
+        target = torch.zeros(len(user) * 2, dtype=torch.float32).to(self.device)
         target[:len(user)] = 1
         rec_loss = self.bce_loss(predict, target)
 
@@ -295,11 +293,9 @@ def full_sort_predict(self, interaction):
         user_index = interaction[self.USER_ID]
         item_index = torch.tensor(range(self.n_items)).to(self.device)
 
-        user = torch.unsqueeze(user_index, dim=1).repeat(
-            1, item_index.shape[0])
+        user = torch.unsqueeze(user_index, dim=1).repeat(1, item_index.shape[0])
         user = torch.flatten(user)
-        item = torch.unsqueeze(item_index, dim=0).repeat(
-            user_index.shape[0], 1)
+        item = torch.unsqueeze(item_index, dim=0).repeat(user_index.shape[0], 1)
         item = torch.flatten(item)
 
         user_e, item_e = self.forward(user, item)
diff --git a/recbole/model/knowledge_aware_recommender/kgnnls.py b/recbole/model/knowledge_aware_recommender/kgnnls.py
index 5749be261..0e262a8cf 100644
--- a/recbole/model/knowledge_aware_recommender/kgnnls.py
+++ b/recbole/model/knowledge_aware_recommender/kgnnls.py
@@ -52,16 +52,13 @@ def __init__(self, config, dataset):
 
         # define embedding
         self.user_embedding = nn.Embedding(self.n_users, self.embedding_size)
-        self.entity_embedding = nn.Embedding(
-            self.n_entities, self.embedding_size)
-        self.relation_embedding = nn.Embedding(
-            self.n_relations + 1, self.embedding_size)
+        self.entity_embedding = nn.Embedding(self.n_entities, self.embedding_size)
+        self.relation_embedding = nn.Embedding(self.n_relations + 1, self.embedding_size)
 
         # sample neighbors and construct interaction table
         kg_graph = dataset.kg_graph(form='coo', value_field='relation_id')
         adj_entity, adj_relation = self.construct_adj(kg_graph)
-        self.adj_entity, self.adj_relation = adj_entity.to(
-            self.device), adj_relation.to(self.device)
+        self.adj_entity, self.adj_relation = adj_entity.to(self.device), adj_relation.to(self.device)
 
         inter_feat = dataset.dataset.inter_feat
         pos_users = inter_feat[dataset.dataset.uid_field]
@@ -74,9 +71,12 @@ def __init__(self, config, dataset):
         self.softmax = nn.Softmax(dim=-1)
         self.linear_layers = torch.nn.ModuleList()
         for i in range(self.n_iter):
-            self.linear_layers.append(nn.Linear(
-                self.embedding_size if not self.aggregator_class == 'concat' else self.embedding_size * 2,
-                self.embedding_size))
+            self.linear_layers.append(
+                nn.Linear(
+                    self.embedding_size if not self.aggregator_class == 'concat' else self.embedding_size * 2,
+                    self.embedding_size
+                )
+            )
         self.ReLU = nn.ReLU()
         self.Tanh = nn.Tanh()
 
@@ -173,11 +173,13 @@ def construct_adj(self, kg_graph):
             neighbors = kg_dict[entity]
             n_neighbors = len(neighbors)
             if n_neighbors >= self.neighbor_sample_size:
-                sampled_indices = np.random.choice(list(range(n_neighbors)), size=self.neighbor_sample_size,
-                                                   replace=False)
+                sampled_indices = np.random.choice(
+                    list(range(n_neighbors)), size=self.neighbor_sample_size, replace=False
+                )
             else:
-                sampled_indices = np.random.choice(list(range(n_neighbors)), size=self.neighbor_sample_size,
-                                                   replace=True)
+                sampled_indices = np.random.choice(
+                    list(range(n_neighbors)), size=self.neighbor_sample_size, replace=True
+                )
             adj_entity[entity] = np.array([neighbors[i][0] for i in sampled_indices])
             adj_relation[entity] = np.array([neighbors[i][1] for i in sampled_indices])
 
@@ -241,13 +243,18 @@ def aggregate(self, user_embeddings, entities, relations):
                 neighbor_relations = relation_vectors[hop].reshape(shape)
 
                 # mix_neighbor_vectors
-                user_embeddings = user_embeddings.reshape(self.batch_size, 1, 1, self.embedding_size)  # [batch_size, 1, 1, dim]
-                user_relation_scores = torch.mean(user_embeddings * neighbor_relations,
-                                                  dim=-1)  # [batch_size, -1, n_neighbor]
-                user_relation_scores_normalized = torch.unsqueeze(self.softmax(user_relation_scores),
-                                                                  dim=-1)  # [batch_size, -1, n_neighbor, 1]
-                neighbors_agg = torch.mean(user_relation_scores_normalized * neighbor_vectors,
-                                           dim=2)  # [batch_size, -1, dim]
+                user_embeddings = user_embeddings.reshape(
+                    self.batch_size, 1, 1, self.embedding_size
+                )  # [batch_size, 1, 1, dim]
+                user_relation_scores = torch.mean(
+                    user_embeddings * neighbor_relations, dim=-1
+                )  # [batch_size, -1, n_neighbor]
+                user_relation_scores_normalized = torch.unsqueeze(
+                    self.softmax(user_relation_scores), dim=-1
+                )  # [batch_size, -1, n_neighbor, 1]
+                neighbors_agg = torch.mean(
+                    user_relation_scores_normalized * neighbor_vectors, dim=2
+                )  # [batch_size, -1, dim]
 
                 if self.aggregator_class == 'sum':
                     output = (self_vectors + neighbors_agg).reshape(-1, self.embedding_size)  # [-1, dim]
@@ -337,18 +344,22 @@ def lookup_interaction_table(x, _):
                 masks = reset_masks[hop]
                 self_labels = entity_labels[hop]
                 neighbor_labels = entity_labels[hop + 1].reshape(self.batch_size, -1, self.neighbor_sample_size)
-                neighbor_relations = relation_vectors[hop].reshape(self.batch_size, -1, self.neighbor_sample_size,
-                                                                   self.embedding_size)
+                neighbor_relations = relation_vectors[hop].reshape(
+                    self.batch_size, -1, self.neighbor_sample_size, self.embedding_size
+                )
 
                 # mix_neighbor_labels
-                user_embeddings = user_embeddings.reshape(self.batch_size, 1, 1,
-                                                          self.embedding_size)  # [batch_size, 1, 1, dim]
-                user_relation_scores = torch.mean(user_embeddings * neighbor_relations,
-                                                  dim=-1)  # [batch_size, -1, n_neighbor]
+                user_embeddings = user_embeddings.reshape(
+                    self.batch_size, 1, 1, self.embedding_size
+                )  # [batch_size, 1, 1, dim]
+                user_relation_scores = torch.mean(
+                    user_embeddings * neighbor_relations, dim=-1
+                )  # [batch_size, -1, n_neighbor]
                 user_relation_scores_normalized = self.softmax(user_relation_scores)  # [batch_size, -1, n_neighbor]
 
-                neighbors_aggregated_label = torch.mean(user_relation_scores_normalized * neighbor_labels,
-                                                        dim=2)  # [batch_size, -1, dim] # [batch_size, -1]
+                neighbors_aggregated_label = torch.mean(
+                    user_relation_scores_normalized * neighbor_labels, dim=2
+                )  # [batch_size, -1, dim] # [batch_size, -1]
                 output = masks.float() * self_labels + \
                          torch.logical_not(masks).float() * neighbors_aggregated_label
 
@@ -384,8 +395,7 @@ def calculate_ls_loss(self, user, item, target):
         user_e = self.user_embedding(user)
         entities, relations = self.get_neighbors(item)
 
-        predicted_labels = self.label_smoothness_predict(
-            user_e, user, entities, relations)
+        predicted_labels = self.label_smoothness_predict(user_e, user, entities, relations)
         ls_loss = self.bce_loss(predicted_labels, target)
         return ls_loss
 
@@ -393,8 +403,7 @@ def calculate_loss(self, interaction):
         user = interaction[self.USER_ID]
         pos_item = interaction[self.ITEM_ID]
         neg_item = interaction[self.NEG_ITEM_ID]
-        target = torch.zeros(
-            len(user) * 2, dtype=torch.float32).to(self.device)
+        target = torch.zeros(len(user) * 2, dtype=torch.float32).to(self.device)
         target[:len(user)] = 1
 
         users = torch.cat((user, user))
@@ -420,11 +429,9 @@ def full_sort_predict(self, interaction):
         user_index = interaction[self.USER_ID]
         item_index = torch.tensor(range(self.n_items)).to(self.device)
 
-        user = torch.unsqueeze(user_index, dim=1).repeat(
-            1, item_index.shape[0])
+        user = torch.unsqueeze(user_index, dim=1).repeat(1, item_index.shape[0])
         user = torch.flatten(user)
-        item = torch.unsqueeze(item_index, dim=0).repeat(
-            user_index.shape[0], 1)
+        item = torch.unsqueeze(item_index, dim=0).repeat(user_index.shape[0], 1)
         item = torch.flatten(item)
 
         user_e, item_e = self.forward(user, item)
diff --git a/recbole/model/knowledge_aware_recommender/ktup.py b/recbole/model/knowledge_aware_recommender/ktup.py
index c3277efc6..adafa68e6 100644
--- a/recbole/model/knowledge_aware_recommender/ktup.py
+++ b/recbole/model/knowledge_aware_recommender/ktup.py
@@ -120,15 +120,14 @@ def st_gumbel_softmax(self, logits, temperature=1.0):
         y = logits + gumbel_noise
         y = self._masked_softmax(logits=y / temperature)
         y_argmax = y.max(len(y.shape) - 1)[1]
-        y_hard = self.convert_to_one_hot(
-            indices=y_argmax,
-            num_classes=y.size(len(y.shape) - 1)).float()
+        y_hard = self.convert_to_one_hot(indices=y_argmax, num_classes=y.size(len(y.shape) - 1)).float()
         y = (y_hard - y).detach() + y
         return y
 
     def _get_preferences(self, user_e, item_e, use_st_gumbel=False):
-        pref_probs = torch.matmul(user_e + item_e,
-                                  torch.t(self.pref_embedding.weight + self.relation_embedding.weight)) / 2
+        pref_probs = torch.matmul(
+            user_e + item_e, torch.t(self.pref_embedding.weight + self.relation_embedding.weight)
+        ) / 2
         if use_st_gumbel:
             # todo: different torch versions may cause the st_gumbel_softmax to report errors, wait to be test
             pref_probs = self.st_gumbel_softmax(pref_probs)
@@ -142,9 +141,9 @@ def _transH_projection(original, norm):
 
     def _get_score(self, h_e, r_e, t_e):
         if self.L1_flag:
-            score = - torch.sum(torch.abs(h_e + r_e - t_e), 1)
+            score = -torch.sum(torch.abs(h_e + r_e - t_e), 1)
         else:
-            score = - torch.sum((h_e + r_e - t_e) ** 2, 1)
+            score = -torch.sum((h_e + r_e - t_e) ** 2, 1)
         return score
 
     def forward(self, user, item):
@@ -210,8 +209,9 @@ def calculate_kg_loss(self, interaction):
         loss = self.kg_weight * (kg_loss + orthogonal_loss + reg_loss)
         entity = torch.cat([h, pos_t, neg_t])
         entity = entity[entity < self.n_items]
-        align_loss = self.align_weight * alignLoss(self.item_embedding(entity), self.entity_embedding(entity),
-                                                   self.L1_flag)
+        align_loss = self.align_weight * alignLoss(
+            self.item_embedding(entity), self.entity_embedding(entity), self.L1_flag
+        )
 
         return loss, align_loss
 
@@ -223,8 +223,10 @@ def predict(self, interaction):
 
 
 def orthogonalLoss(rel_embeddings, norm_embeddings):
-    return torch.sum(torch.sum(norm_embeddings * rel_embeddings, dim=1, keepdim=True) ** 2 /
-                     torch.sum(rel_embeddings ** 2, dim=1, keepdim=True))
+    return torch.sum(
+        torch.sum(norm_embeddings * rel_embeddings, dim=1, keepdim=True) ** 2 /
+        torch.sum(rel_embeddings ** 2, dim=1, keepdim=True)
+    )
 
 
 def alignLoss(emb1, emb2, L1_flag=False):
diff --git a/recbole/model/knowledge_aware_recommender/mkr.py b/recbole/model/knowledge_aware_recommender/mkr.py
index 3b8503f80..7250dcc81 100644
--- a/recbole/model/knowledge_aware_recommender/mkr.py
+++ b/recbole/model/knowledge_aware_recommender/mkr.py
@@ -75,12 +75,14 @@ def __init__(self, config, dataset):
         # parameters initialization
         self.apply(xavier_normal_initialization)
 
-    def forward(self, user_indices=None, item_indices=None, head_indices=None,
-                relation_indices=None, tail_indices=None):
+    def forward(
+        self, user_indices=None, item_indices=None, head_indices=None, relation_indices=None, tail_indices=None
+    ):
         self.item_embeddings = self.item_embeddings_lookup(item_indices)
         self.head_embeddings = self.entity_embeddings_lookup(head_indices)
         self.item_embeddings, self.head_embeddings = self.cc_unit(
-            [self.item_embeddings, self.head_embeddings])  # calculate feature interactions between items and entities
+            [self.item_embeddings, self.head_embeddings]
+        )  # calculate feature interactions between items and entities
 
         if user_indices is not None:
             # RS
@@ -112,8 +114,8 @@ def forward(self, user_indices=None, item_indices=None, head_indices=None,
             self.tail_pred = torch.sigmoid(self.tail_pred)
             self.scores_kge = torch.sigmoid(torch.sum(self.tail_embeddings * self.tail_pred, 1))
             self.rmse = torch.mean(
-                torch.sqrt(torch.sum(torch.pow(self.tail_embeddings -
-                                               self.tail_pred, 2), 1) / self.embedding_size))
+                torch.sqrt(torch.sum(torch.pow(self.tail_embeddings - self.tail_pred, 2), 1) / self.embedding_size)
+            )
             outputs = [self.head_embeddings, self.tail_embeddings, self.scores_kge, self.rmse]
 
         return outputs
diff --git a/recbole/model/knowledge_aware_recommender/ripplenet.py b/recbole/model/knowledge_aware_recommender/ripplenet.py
index aa87b264d..899d74bbc 100644
--- a/recbole/model/knowledge_aware_recommender/ripplenet.py
+++ b/recbole/model/knowledge_aware_recommender/ripplenet.py
@@ -3,7 +3,6 @@
 # @Author : gaole he
 # @Email  : hegaole@ruc.edu.cn
 
-
 r"""
 RippleNet
 #####################################################
diff --git a/recbole/model/layers.py b/recbole/model/layers.py
index 5e2a9107f..3edb31568 100644
--- a/recbole/model/layers.py
+++ b/recbole/model/layers.py
@@ -260,8 +260,9 @@ class SequenceAttLayer(nn.Module):
         torch.Tensor: result
     """
 
-    def __init__(self, mask_mat, att_hidden_size=(80, 40), activation='sigmoid', softmax_stag=False,
-                 return_seq_weight=True):
+    def __init__(
+        self, mask_mat, att_hidden_size=(80, 40), activation='sigmoid', softmax_stag=False, return_seq_weight=True
+    ):
         super(SequenceAttLayer, self).__init__()
         self.att_hidden_size = att_hidden_size
         self.activation = activation
@@ -323,11 +324,7 @@ class VanillaAttention(nn.Module):
 
     def __init__(self, hidden_dim, attn_dim):
         super().__init__()
-        self.projection = nn.Sequential(
-            nn.Linear(hidden_dim, attn_dim),
-            nn.ReLU(True),
-            nn.Linear(attn_dim, 1)
-        )
+        self.projection = nn.Sequential(nn.Linear(hidden_dim, attn_dim), nn.ReLU(True), nn.Linear(attn_dim, 1))
 
     def forward(self, input_tensor):
         # (B, Len, num, H) -> (B, Len, num, 1)
@@ -356,7 +353,8 @@ def __init__(self, n_heads, hidden_size, hidden_dropout_prob, attn_dropout_prob,
         if hidden_size % n_heads != 0:
             raise ValueError(
                 "The hidden size (%d) is not a multiple of the number of attention "
-                "heads (%d)" % (hidden_size, n_heads))
+                "heads (%d)" % (hidden_size, n_heads)
+            )
 
         self.num_attention_heads = n_heads
         self.attention_head_size = int(hidden_size / n_heads)
@@ -482,13 +480,15 @@ class TransformerLayer(nn.Module):
 
     """
 
-    def __init__(self, n_heads, hidden_size, intermediate_size,
-                 hidden_dropout_prob, attn_dropout_prob, hidden_act, layer_norm_eps):
+    def __init__(
+        self, n_heads, hidden_size, intermediate_size, hidden_dropout_prob, attn_dropout_prob, hidden_act,
+        layer_norm_eps
+    ):
         super(TransformerLayer, self).__init__()
-        self.multi_head_attention = MultiHeadAttention(n_heads, hidden_size,
-                                                       hidden_dropout_prob, attn_dropout_prob, layer_norm_eps)
-        self.feed_forward = FeedForward(hidden_size, intermediate_size,
-                                        hidden_dropout_prob, hidden_act, layer_norm_eps)
+        self.multi_head_attention = MultiHeadAttention(
+            n_heads, hidden_size, hidden_dropout_prob, attn_dropout_prob, layer_norm_eps
+        )
+        self.feed_forward = FeedForward(hidden_size, intermediate_size, hidden_dropout_prob, hidden_act, layer_norm_eps)
 
     def forward(self, hidden_states, attention_mask):
         attention_output = self.multi_head_attention(hidden_states, attention_mask)
@@ -511,21 +511,23 @@ class TransformerEncoder(nn.Module):
 
     """
 
-    def __init__(self,
-                 n_layers=2,
-                 n_heads=2,
-                 hidden_size=64,
-                 inner_size=256,
-                 hidden_dropout_prob=0.5,
-                 attn_dropout_prob=0.5,
-                 hidden_act='gelu',
-                 layer_norm_eps=1e-12):
+    def __init__(
+        self,
+        n_layers=2,
+        n_heads=2,
+        hidden_size=64,
+        inner_size=256,
+        hidden_dropout_prob=0.5,
+        attn_dropout_prob=0.5,
+        hidden_act='gelu',
+        layer_norm_eps=1e-12
+    ):
 
         super(TransformerEncoder, self).__init__()
-        layer = TransformerLayer(n_heads, hidden_size, inner_size,
-                                 hidden_dropout_prob, attn_dropout_prob, hidden_act, layer_norm_eps)
-        self.layer = nn.ModuleList([copy.deepcopy(layer)
-                                    for _ in range(n_layers)])
+        layer = TransformerLayer(
+            n_heads, hidden_size, inner_size, hidden_dropout_prob, attn_dropout_prob, hidden_act, layer_norm_eps
+        )
+        self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(n_layers)])
 
     def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True):
         """
@@ -593,17 +595,19 @@ def get_embedding(self):
                 self.token_field_offsets[type] = np.array((0, *np.cumsum(self.token_field_dims[type])[:-1]),
                                                           dtype=np.long)
 
-                self.token_embedding_table[type] = FMEmbedding(self.token_field_dims[type],
-                                                               self.token_field_offsets[type],
-                                                               self.embedding_size).to(self.device)
+                self.token_embedding_table[type] = FMEmbedding(
+                    self.token_field_dims[type], self.token_field_offsets[type], self.embedding_size
+                ).to(self.device)
             if len(self.float_field_dims[type]) > 0:
-                self.float_embedding_table[type] = nn.Embedding(np.sum(self.float_field_dims[type], dtype=np.int32),
-                                                                self.embedding_size).to(self.device)
+                self.float_embedding_table[type] = nn.Embedding(
+                    np.sum(self.float_field_dims[type], dtype=np.int32), self.embedding_size
+                ).to(self.device)
             if len(self.token_seq_field_dims) > 0:
                 self.token_seq_embedding_table[type] = nn.ModuleList()
                 for token_seq_field_dim in self.token_seq_field_dims[type]:
                     self.token_seq_embedding_table[type].append(
-                        nn.Embedding(token_seq_field_dim, self.embedding_size).to(self.device))
+                        nn.Embedding(token_seq_field_dim, self.embedding_size).to(self.device)
+                    )
 
     def embed_float_fields(self, float_fields, type, embed=True):
         """Get the embedding of float fields.
@@ -674,18 +678,18 @@ def embed_token_seq_fields(self, token_seq_fields, type):
             mask = mask.float()
             value_cnt = torch.sum(mask, dim=-1, keepdim=True)  # [batch_size, max_item_length, 1]
             token_seq_embedding = embedding_table(token_seq_field)  # [batch_size, max_item_length, seq_len, embed_dim]
-            mask = mask.unsqueeze(-1).expand_as(
-                token_seq_embedding)  # [batch_size, max_item_length, seq_len, embed_dim]
+            mask = mask.unsqueeze(-1).expand_as(token_seq_embedding)
             if self.pooling_mode == 'max':
-                masked_token_seq_embedding = token_seq_embedding - (
-                        1 - mask) * 1e9  # [batch_size, max_item_length, seq_len, embed_dim]
-                result = torch.max(masked_token_seq_embedding, dim=-2,
-                                   keepdim=True)  # [batch_size, max_item_length, 1, embed_dim]
+                masked_token_seq_embedding = token_seq_embedding - (1 - mask) * 1e9
+                result = torch.max(
+                    masked_token_seq_embedding, dim=-2, keepdim=True
+                )  # [batch_size, max_item_length, 1, embed_dim]
                 result = result.values
             elif self.pooling_mode == 'sum':
                 masked_token_seq_embedding = token_seq_embedding * mask.float()
-                result = torch.sum(masked_token_seq_embedding, dim=-2,
-                                   keepdim=True)  # [batch_size, max_item_length, 1, embed_dim]
+                result = torch.sum(
+                    masked_token_seq_embedding, dim=-2, keepdim=True
+                )  # [batch_size, max_item_length, 1, embed_dim]
             else:
                 masked_token_seq_embedding = token_seq_embedding * mask.float()
                 result = torch.sum(masked_token_seq_embedding, dim=-2)  # [batch_size, max_item_length, embed_dim]
@@ -722,9 +726,7 @@ def embed_input_fields(self, user_idx, item_idx):
             float_fields = []
             for field_name in self.float_field_names[type]:
                 feature = user_item_feat[type][field_name][user_item_idx[type]]
-                float_fields.append(feature
-                                    if len(feature.shape) == (2 + (type == 'item'))
-                                    else feature.unsqueeze(-1))
+                float_fields.append(feature if len(feature.shape) == (2 + (type == 'item')) else feature.unsqueeze(-1))
             if len(float_fields) > 0:
                 float_fields = torch.cat(float_fields, dim=1)  # [batch_size, max_item_length, num_float_field]
             else:
@@ -757,8 +759,8 @@ def embed_input_fields(self, user_idx, item_idx):
                 if token_seq_fields_embedding[type] is None:
                     sparse_embedding[type] = token_fields_embedding[type]
                 else:
-                    sparse_embedding[type] = torch.cat([token_fields_embedding[type],
-                                                        token_seq_fields_embedding[type]], dim=-2)
+                    sparse_embedding[type] = torch.cat([token_fields_embedding[type], token_seq_fields_embedding[type]],
+                                                       dim=-2)
             dense_embedding[type] = float_fields_embedding[type]
 
         # sparse_embedding[type]
@@ -783,8 +785,10 @@ def __init__(self, dataset, embedding_size, pooling_mode, device):
         self.user_feat = self.dataset.get_user_feature().to(self.device)
         self.item_feat = self.dataset.get_item_feature().to(self.device)
 
-        self.field_names = {'user': list(self.user_feat.interaction.keys()),
-                            'item': list(self.item_feat.interaction.keys())}
+        self.field_names = {
+            'user': list(self.user_feat.interaction.keys()),
+            'item': list(self.item_feat.interaction.keys())
+        }
 
         self.types = ['user', 'item']
         self.pooling_mode = pooling_mode
@@ -868,7 +872,8 @@ def __init__(self, channels, kernels, strides, activation='relu', init_method=No
 
         for i in range(self.num_of_nets):
             cnn_modules.append(
-                nn.Conv2d(self.channels[i], self.channels[i + 1], self.kernels[i], stride=self.strides[i]))
+                nn.Conv2d(self.channels[i], self.channels[i + 1], self.kernels[i], stride=self.strides[i])
+            )
             if self.activation.lower() == 'sigmoid':
                 cnn_modules.append(nn.Sigmoid())
             elif self.activation.lower() == 'tanh':
@@ -1017,8 +1022,10 @@ def forward(self, interaction):
         total_fields_embedding = []
         float_fields = []
         for field_name in self.float_field_names:
-            float_fields.append(interaction[field_name]
-                                if len(interaction[field_name].shape) == 2 else interaction[field_name].unsqueeze(1))
+            if len(interaction[field_name].shape) == 2:
+                float_fields.append(interaction[field_name])
+            else:
+                float_fields.append(interaction[field_name].unsqueeze(1))
 
         if len(float_fields) > 0:
             float_fields = torch.cat(float_fields, dim=1)  # [batch_size, num_float_field]
@@ -1069,8 +1076,7 @@ def forward(self, x):
         if not self.training:
             return x
 
-        mask = ((torch.rand(x._values().size()) +
-                 self.kprob).floor()).type(torch.bool)
+        mask = ((torch.rand(x._values().size()) + self.kprob).floor()).type(torch.bool)
         rc = x._indices()[:, mask]
         val = x._values()[mask] * (1.0 / self.kprob)
         return torch.sparse.FloatTensor(rc, val, x.shape)
diff --git a/recbole/model/loss.py b/recbole/model/loss.py
index 9727261f3..78c9cf8b2 100644
--- a/recbole/model/loss.py
+++ b/recbole/model/loss.py
@@ -7,7 +7,6 @@
 # @Author : Shanlei Mu
 # @Email  : slmu@ruc.edu.cn
 
-
 """
 recbole.model.loss
 #######################
@@ -43,7 +42,7 @@ def __init__(self, gamma=1e-10):
         self.gamma = gamma
 
     def forward(self, pos_score, neg_score):
-        loss = - torch.log(self.gamma + torch.sigmoid(pos_score - neg_score)).mean()
+        loss = -torch.log(self.gamma + torch.sigmoid(pos_score - neg_score)).mean()
         return loss
 
 
diff --git a/recbole/model/sequential_recommender/bert4rec.py b/recbole/model/sequential_recommender/bert4rec.py
index f808e1ff9..1a923b7db 100644
--- a/recbole/model/sequential_recommender/bert4rec.py
+++ b/recbole/model/sequential_recommender/bert4rec.py
@@ -52,11 +52,16 @@ def __init__(self, config, dataset):
         # define layers and loss
         self.item_embedding = nn.Embedding(self.n_items + 1, self.hidden_size, padding_idx=0)  # mask token add 1
         self.position_embedding = nn.Embedding(self.max_seq_length + 1, self.hidden_size)  # add mask_token at the last
-        self.trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                              hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                              hidden_dropout_prob=self.hidden_dropout_prob,
-                                              attn_dropout_prob=self.attn_dropout_prob,
-                                              hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.LayerNorm = nn.LayerNorm(self.hidden_size, eps=self.layer_norm_eps)
         self.dropout = nn.Dropout(self.hidden_dropout_prob)
@@ -169,9 +174,7 @@ def forward(self, item_seq):
         input_emb = self.LayerNorm(input_emb)
         input_emb = self.dropout(input_emb)
         extended_attention_mask = self.get_attention_mask(item_seq)
-        trm_output = self.trm_encoder(input_emb,
-                                      extended_attention_mask,
-                                      output_all_encoded_layers=True)
+        trm_output = self.trm_encoder(input_emb, extended_attention_mask, output_all_encoded_layers=True)
         output = trm_output[-1]
         return output  # [B L H]
 
diff --git a/recbole/model/sequential_recommender/caser.py b/recbole/model/sequential_recommender/caser.py
index 7203685e3..229816deb 100644
--- a/recbole/model/sequential_recommender/caser.py
+++ b/recbole/model/sequential_recommender/caser.py
@@ -36,6 +36,7 @@ class Caser(SequentialRecommender):
         We did not use the sliding window to generate training instances as in the paper, in order that
         the generation method we used is common to other sequential models.
         For comparison with other models, we set the parameter T in the paper as 1.
+        In addition, to prevent excessive CNN layers (ValueError: Training loss is nan), please make sure the parameters MAX_ITEM_LIST_LENGTH small, such as 10.
     """
 
     def __init__(self, config, dataset):
@@ -62,8 +63,7 @@ def __init__(self, config, dataset):
         # horizontal conv layer
         lengths = [i + 1 for i in range(self.max_seq_length)]
         self.conv_h = nn.ModuleList([
-            nn.Conv2d(in_channels=1, out_channels=self.n_h, kernel_size=(i, self.embedding_size))
-            for i in lengths
+            nn.Conv2d(in_channels=1, out_channels=self.n_h, kernel_size=(i, self.embedding_size)) for i in lengths
         ])
 
         # fully-connected layer
@@ -157,8 +157,9 @@ def calculate_loss(self, interaction):
             logits = torch.matmul(seq_output, test_item_emb.transpose(0, 1))
             loss = self.loss_fct(logits, pos_items)
 
-        reg_loss = self.reg_loss([self.user_embedding.weight, self.item_embedding.weight,
-                                  self.conv_v.weight, self.fc1.weight, self.fc2.weight])
+        reg_loss = self.reg_loss([
+            self.user_embedding.weight, self.item_embedding.weight, self.conv_v.weight, self.fc1.weight, self.fc2.weight
+        ])
         loss = loss + self.reg_weight * reg_loss + self.reg_loss_conv_h()
         return loss
 
diff --git a/recbole/model/sequential_recommender/din.py b/recbole/model/sequential_recommender/din.py
index f8f1e6b51..dcb7ce395 100644
--- a/recbole/model/sequential_recommender/din.py
+++ b/recbole/model/sequential_recommender/din.py
@@ -65,15 +65,10 @@ def __init__(self, config, dataset):
         self.att_list = [4 * num_item_feature * self.embedding_size] + self.mlp_hidden_size
 
         mask_mat = torch.arange(self.max_seq_length).to(self.device).view(1, -1)  # init mask
-        self.attention = SequenceAttLayer(mask_mat,
-                                          self.att_list,
-                                          activation='Sigmoid',
-                                          softmax_stag=False,
-                                          return_seq_weight=False)
-        self.dnn_mlp_layers = MLPLayers(self.dnn_list,
-                                        activation='Dice',
-                                        dropout=self.dropout_prob,
-                                        bn=True)
+        self.attention = SequenceAttLayer(
+            mask_mat, self.att_list, activation='Sigmoid', softmax_stag=False, return_seq_weight=False
+        )
+        self.dnn_mlp_layers = MLPLayers(self.dnn_list, activation='Dice', dropout=self.dropout_prob, bn=True)
 
         self.embedding_layer = ContextSeqEmbLayer(dataset, self.embedding_size, self.pooling_mode, self.device)
         self.dnn_predict_layers = nn.Linear(self.mlp_hidden_size[-1], 1)
@@ -123,8 +118,7 @@ def forward(self, user, item_seq, item_seq_len, next_items):
         user_emb = user_emb.squeeze()
 
         # input the DNN to get the prediction score
-        din_in = torch.cat([user_emb, target_item_feat_emb,
-                            user_emb * target_item_feat_emb], dim=-1)
+        din_in = torch.cat([user_emb, target_item_feat_emb, user_emb * target_item_feat_emb], dim=-1)
         din_out = self.dnn_mlp_layers(din_in)
         preds = self.dnn_predict_layers(din_out)
         preds = self.sigmoid(preds)
diff --git a/recbole/model/sequential_recommender/fdsa.py b/recbole/model/sequential_recommender/fdsa.py
index bae7192ab..0fba79689 100644
--- a/recbole/model/sequential_recommender/fdsa.py
+++ b/recbole/model/sequential_recommender/fdsa.py
@@ -53,22 +53,33 @@ def __init__(self, config, dataset):
         self.item_embedding = nn.Embedding(self.n_items, self.hidden_size, padding_idx=0)
         self.position_embedding = nn.Embedding(self.max_seq_length, self.hidden_size)
 
-        self.feature_embed_layer = FeatureSeqEmbLayer(dataset, self.hidden_size, self.selected_features,
-                                                      self.pooling_mode, self.device)
-
-        self.item_trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                                   hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                                   hidden_dropout_prob=self.hidden_dropout_prob,
-                                                   attn_dropout_prob=self.attn_dropout_prob,
-                                                   hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.feature_embed_layer = FeatureSeqEmbLayer(
+            dataset, self.hidden_size, self.selected_features, self.pooling_mode, self.device
+        )
+
+        self.item_trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.feature_att_layer = VanillaAttention(self.hidden_size, self.hidden_size)
         # For simplicity, we use same architecture for item_trm and feature_trm
-        self.feature_trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                                      hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                                      hidden_dropout_prob=self.hidden_dropout_prob,
-                                                      attn_dropout_prob=self.attn_dropout_prob,
-                                                      hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.feature_trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.LayerNorm = nn.LayerNorm(self.hidden_size, eps=self.layer_norm_eps)
         self.dropout = nn.Dropout(self.hidden_dropout_prob)
@@ -149,14 +160,12 @@ def forward(self, item_seq, item_seq_len):
 
         extended_attention_mask = self.get_attention_mask(item_seq)
 
-        item_trm_output = self.item_trm_encoder(item_trm_input,
-                                                extended_attention_mask,
-                                                output_all_encoded_layers=True)
+        item_trm_output = self.item_trm_encoder(item_trm_input, extended_attention_mask, output_all_encoded_layers=True)
         item_output = item_trm_output[-1]
 
-        feature_trm_output = self.feature_trm_encoder(feature_trm_input,
-                                                      extended_attention_mask,
-                                                      output_all_encoded_layers=True)  # [B Len H]
+        feature_trm_output = self.feature_trm_encoder(
+            feature_trm_input, extended_attention_mask, output_all_encoded_layers=True
+        )  # [B Len H]
         feature_output = feature_trm_output[-1]
 
         item_output = self.gather_indexes(item_output, item_seq_len - 1)  # [B H]
diff --git a/recbole/model/sequential_recommender/fossil.py b/recbole/model/sequential_recommender/fossil.py
index 30f95b29e..0432174bf 100644
--- a/recbole/model/sequential_recommender/fossil.py
+++ b/recbole/model/sequential_recommender/fossil.py
@@ -61,12 +61,15 @@ def __init__(self, config, dataset):
 
     def inverse_seq_item_embedding(self, seq_item_embedding, seq_item_len):
         """
-        inverse seq_item_embedding like this:
-            simple to 2-dim
+        inverse seq_item_embedding like this (simple to 2-dim):
+
         [1,2,3,0,0,0] -- ??? -- >> [0,0,0,1,2,3]
+
         first: [0,0,0,0,0,0] concat [1,2,3,0,0,0]
+
         using gather_indexes: to get one by one
-            first get 3,then 2,last 1
+
+        first get 3,then 2,last 1
         """
         zeros = torch.zeros_like(seq_item_embedding, dtype=torch.float).to(self.device)
         # batch_size * seq_len * embedding_size
@@ -74,8 +77,9 @@ def inverse_seq_item_embedding(self, seq_item_embedding, seq_item_len):
         # batch_size * 2_mul_seq_len * embedding_size
         embedding_list = list()
         for i in range(self.order_len):
-            embedding = self.gather_indexes(item_embedding_zeros,
-                                            self.max_seq_length + seq_item_len - self.order_len + i)
+            embedding = self.gather_indexes(
+                item_embedding_zeros, self.max_seq_length + seq_item_len - self.order_len + i
+            )
             embedding_list.append(embedding.unsqueeze(1))
         short_item_embedding = torch.cat(embedding_list, dim=1)
         # batch_size * short_len * embedding_size
diff --git a/recbole/model/sequential_recommender/gcsan.py b/recbole/model/sequential_recommender/gcsan.py
index 597ba2361..e64381473 100644
--- a/recbole/model/sequential_recommender/gcsan.py
+++ b/recbole/model/sequential_recommender/gcsan.py
@@ -3,7 +3,6 @@
 # @Author : Yujie Lu
 # @Email  : yujielu1998@gmail.com
 
-
 r"""
 GCSAN
 ################################################
@@ -68,7 +67,7 @@ def GNNCell(self, A, hidden):
         """
 
         input_in = torch.matmul(A[:, :, :A.size(1)], self.linear_edge_in(hidden))
-        input_out = torch.matmul(A[:, :, A.size(1): 2 * A.size(1)], self.linear_edge_out(hidden))
+        input_out = torch.matmul(A[:, :, A.size(1):2 * A.size(1)], self.linear_edge_out(hidden))
         # [batch_size, max_session_len, embedding_size * 2]
         inputs = torch.cat([input_in, input_out], 2)
 
@@ -124,11 +123,16 @@ def __init__(self, config, dataset):
         # define layers and loss
         self.item_embedding = nn.Embedding(self.n_items, self.hidden_size, padding_idx=0)
         self.gnn = GNN(self.hidden_size, self.step)
-        self.self_attention = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                                 hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                                 hidden_dropout_prob=self.hidden_dropout_prob,
-                                                 attn_dropout_prob=self.attn_dropout_prob,
-                                                 hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.self_attention = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
         self.reg_loss = EmbLoss()
         if self.loss_type == 'BPR':
             self.loss_fct = BPRLoss()
diff --git a/recbole/model/sequential_recommender/gru4rec.py b/recbole/model/sequential_recommender/gru4rec.py
index 94d542174..0ea93e331 100644
--- a/recbole/model/sequential_recommender/gru4rec.py
+++ b/recbole/model/sequential_recommender/gru4rec.py
@@ -70,8 +70,8 @@ def _init_weights(self, module):
         if isinstance(module, nn.Embedding):
             xavier_normal_(module.weight)
         elif isinstance(module, nn.GRU):
-            xavier_uniform_(self.gru_layers.weight_hh_l0)
-            xavier_uniform_(self.gru_layers.weight_ih_l0)
+            xavier_uniform_(module.weight_hh_l0)
+            xavier_uniform_(module.weight_ih_l0)
 
     def forward(self, item_seq, item_seq_len):
         item_seq_emb = self.item_embedding(item_seq)
diff --git a/recbole/model/sequential_recommender/gru4recf.py b/recbole/model/sequential_recommender/gru4recf.py
index 608c6448f..55f8d2360 100644
--- a/recbole/model/sequential_recommender/gru4recf.py
+++ b/recbole/model/sequential_recommender/gru4recf.py
@@ -56,8 +56,9 @@ def __init__(self, config, dataset):
 
         # define layers and loss
         self.item_embedding = nn.Embedding(self.n_items, self.embedding_size, padding_idx=0)
-        self.feature_embed_layer = FeatureSeqEmbLayer(dataset, self.embedding_size, self.selected_features,
-                                                      self.pooling_mode, self.device)
+        self.feature_embed_layer = FeatureSeqEmbLayer(
+            dataset, self.embedding_size, self.selected_features, self.pooling_mode, self.device
+        )
         self.item_gru_layers = nn.GRU(
             input_size=self.embedding_size,
             hidden_size=self.hidden_size,
diff --git a/recbole/model/sequential_recommender/gru4reckg.py b/recbole/model/sequential_recommender/gru4reckg.py
index a275e7d19..e8e68d5d8 100644
--- a/recbole/model/sequential_recommender/gru4reckg.py
+++ b/recbole/model/sequential_recommender/gru4reckg.py
@@ -7,7 +7,6 @@
 # @Author : Yupeng Hou
 # @Email  : houyupeng@ruc.edu.cn
 
-
 r"""
 GRU4RecKG
 ################################################
diff --git a/recbole/model/sequential_recommender/hgn.py b/recbole/model/sequential_recommender/hgn.py
index 009b455ab..dc5f6ec49 100644
--- a/recbole/model/sequential_recommender/hgn.py
+++ b/recbole/model/sequential_recommender/hgn.py
@@ -75,10 +75,16 @@ def __init__(self, config, dataset):
     def reg_loss(self, user_embedding, item_embedding, seq_item_embedding):
 
         reg_1, reg_2 = self.reg_weight
-        loss_1 = reg_1 * torch.norm(self.w1.weight, p=2) + reg_1 * torch.norm(self.w2.weight, p=2) + reg_1 * torch.norm(
-            self.w3.weight, p=2) + reg_1 * torch.norm(self.w4.weight, p=2)
-        loss_2 = reg_2 * torch.norm(user_embedding, p=2) + reg_2 * torch.norm(item_embedding, p=2) + reg_2 * torch.norm(
-            seq_item_embedding, p=2)
+        loss_1_part_1 = reg_1 * torch.norm(self.w1.weight, p=2)
+        loss_1_part_2 = reg_1 * torch.norm(self.w2.weight, p=2)
+        loss_1_part_3 = reg_1 * torch.norm(self.w3.weight, p=2)
+        loss_1_part_4 = reg_1 * torch.norm(self.w4.weight, p=2)
+        loss_1 = loss_1_part_1 + loss_1_part_2 + loss_1_part_3 + loss_1_part_4
+
+        loss_2_part_1 = reg_2 * torch.norm(user_embedding, p=2)
+        loss_2_part_2 = reg_2 * torch.norm(item_embedding, p=2)
+        loss_2_part_3 = reg_2 * torch.norm(seq_item_embedding, p=2)
+        loss_2 = loss_2_part_1 + loss_2_part_2 + loss_2_part_3
 
         return loss_1 + loss_2
 
diff --git a/recbole/model/sequential_recommender/hrm.py b/recbole/model/sequential_recommender/hrm.py
index 0eb2908d4..421835266 100644
--- a/recbole/model/sequential_recommender/hrm.py
+++ b/recbole/model/sequential_recommender/hrm.py
@@ -112,7 +112,9 @@ def forward(self, seq_item, user, seq_item_len):
             high_order_item_embedding = torch.div(high_order_item_embedding, seq_item_len.unsqueeze(1).float())
             # batch_size * embedding_size
         hybrid_user_embedding = self.dropout(
-            torch.cat([user_embedding.unsqueeze(dim=1), high_order_item_embedding.unsqueeze(dim=1)], dim=1))
+            torch.cat([user_embedding.unsqueeze(dim=1),
+                       high_order_item_embedding.unsqueeze(dim=1)], dim=1)
+        )
         # batch_size * 2_mul_embedding_size
 
         # layer 2
diff --git a/recbole/model/sequential_recommender/ksr.py b/recbole/model/sequential_recommender/ksr.py
index 6c884b049..bf48c6719 100644
--- a/recbole/model/sequential_recommender/ksr.py
+++ b/recbole/model/sequential_recommender/ksr.py
@@ -3,7 +3,6 @@
 # @Author : Jin Huang and Shanlei Mu
 # @Email  : Betsyj.huang@gmail.com and slmu@ruc.edu.cn
 
-
 r"""
 KSR
 ################################################
@@ -77,16 +76,16 @@ def __init__(self, config, dataset):
         # parameters initialization
         self.apply(self._init_weights)
         self.entity_embedding.weight.data.copy_(torch.from_numpy(self.entity_embedding_matrix[:self.n_items]))
-        self.relation_Matrix = torch.from_numpy(self.relation_embedding_matrix[:self.n_relations]).to(
-            self.device)  # [R H]
+        self.relation_Matrix = torch.from_numpy(self.relation_embedding_matrix[:self.n_relations]
+                                                ).to(self.device)  # [R H]
 
     def _init_weights(self, module):
         """ Initialize the weights """
         if isinstance(module, nn.Embedding):
             xavier_normal_(module.weight)
         elif isinstance(module, nn.GRU):
-            xavier_uniform_(self.gru_layers.weight_hh_l0)
-            xavier_uniform_(self.gru_layers.weight_ih_l0)
+            xavier_uniform_(module.weight_hh_l0)
+            xavier_uniform_(module.weight_ih_l0)
 
     def _get_kg_embedding(self, head):
         """Difference:
@@ -100,8 +99,8 @@ def _get_kg_embedding(self, head):
         return head_e, tail_Matrix
 
     def _memory_update_cell(self, user_memory, update_memory):
-        z = torch.sigmoid(torch.mul(user_memory, update_memory).sum(-1).float()).unsqueeze(
-            -1)  # [B R 1], the gate vector
+        z = torch.sigmoid(torch.mul(user_memory,
+                                    update_memory).sum(-1).float()).unsqueeze(-1)  # [B R 1], the gate vector
         updated_user_memory = (1.0 - z) * user_memory + z * update_memory
         return updated_user_memory
 
@@ -110,8 +109,8 @@ def memory_update(self, item_seq, item_seq_len):
         step_length = item_seq.size()[1]
         last_item = item_seq_len - 1
         # init user memory with 0s
-        user_memory = torch.zeros(item_seq.size()[0], self.n_relations, self.embedding_size).float().to(
-            self.device)  # [B R H]
+        user_memory = torch.zeros(item_seq.size()[0], self.n_relations,
+                                  self.embedding_size).float().to(self.device)  # [B R H]
         last_user_memory = torch.zeros_like(user_memory)
         for i in range(step_length):  # [len]
             _, update_memory = self._get_kg_embedding(item_seq[:, i])  # [B R H]
@@ -166,7 +165,8 @@ def calculate_loss(self, interaction):
             return loss
         else:  # self.loss_type = 'CE'
             test_items_emb = self.dense_layer_i(
-                torch.cat((self.item_embedding.weight, self.entity_embedding.weight), -1))  # [n_items H]
+                torch.cat((self.item_embedding.weight, self.entity_embedding.weight), -1)
+            )  # [n_items H]
             logits = torch.matmul(seq_output, test_items_emb.transpose(0, 1))
             loss = self.loss_fct(logits, pos_items)
             return loss
@@ -185,6 +185,7 @@ def full_sort_predict(self, interaction):
         item_seq_len = interaction[self.ITEM_SEQ_LEN]
         seq_output = self.forward(item_seq, item_seq_len)
         test_items_emb = self.dense_layer_i(
-            torch.cat((self.item_embedding.weight, self.entity_embedding.weight), -1))  # [n_items H]
+            torch.cat((self.item_embedding.weight, self.entity_embedding.weight), -1)
+        )  # [n_items H]
         scores = torch.matmul(seq_output, test_items_emb.transpose(0, 1))  # [B, n_items]
         return scores
diff --git a/recbole/model/sequential_recommender/nextitnet.py b/recbole/model/sequential_recommender/nextitnet.py
index 9b2aba98e..e0ad3def4 100644
--- a/recbole/model/sequential_recommender/nextitnet.py
+++ b/recbole/model/sequential_recommender/nextitnet.py
@@ -3,7 +3,6 @@
 # @Author : Jingsen Zhang
 # @Email  : zhangjingsen@ruc.edu.cn
 
-
 r"""
 NextItNet
 ################################################
@@ -36,6 +35,7 @@ class NextItNet(SequentialRecommender):
         and then stop the generating process. Although the number of parameters in residual block (a) is less
         than it in residual block (b), the performance of b is better than a.
         So in our model, we use residual block (b).
+        In addition, when dilations is not equal to 1, the training may be slow. To  speed up the efficiency, please set the parameters "reproducibility" False.
     """
 
     def __init__(self, config, dataset):
@@ -54,9 +54,11 @@ def __init__(self, config, dataset):
         self.item_embedding = nn.Embedding(self.n_items, self.embedding_size, padding_idx=0)
 
         # residual blocks    dilations in blocks:[1,2,4,8,1,2,4,8,...]
-        rb = [ResidualBlock_b(self.residual_channels, self.residual_channels,
-                              kernel_size=self.kernel_size, dilation=dilation)
-              for dilation in self.dilations]
+        rb = [
+            ResidualBlock_b(
+                self.residual_channels, self.residual_channels, kernel_size=self.kernel_size, dilation=dilation
+            ) for dilation in self.dilations
+        ]
         self.residual_blocks = nn.Sequential(*rb)
 
         # fully-connected layer
@@ -181,8 +183,8 @@ def conv_pad(self, x, dilation):  # x: [batch_size, seq_len, embed_size]
         """
         inputs_pad = x.permute(0, 2, 1)  # [batch_size, embed_size, seq_len]
         inputs_pad = inputs_pad.unsqueeze(2)  # [batch_size, embed_size, 1, seq_len]
-        pad = nn.ZeroPad2d(
-            ((self.kernel_size - 1) * dilation, 0, 0, 0))  # padding operation  args：(left,right,top,bottom)
+        pad = nn.ZeroPad2d(((self.kernel_size - 1) * dilation, 0, 0, 0))
+        # padding operation  args：(left,right,top,bottom)
         inputs_pad = pad(inputs_pad)  # [batch_size, embed_size, 1, seq_len+(self.kernel_size-1)*dilations]
         return inputs_pad
 
diff --git a/recbole/model/sequential_recommender/repeatnet.py b/recbole/model/sequential_recommender/repeatnet.py
index 75a44dc68..98cc28583 100644
--- a/recbole/model/sequential_recommender/repeatnet.py
+++ b/recbole/model/sequential_recommender/repeatnet.py
@@ -52,20 +52,23 @@ def __init__(self, config, dataset):
         # define the layers and loss function
         self.item_matrix = nn.Embedding(self.n_items, self.embedding_size, padding_idx=0)
         self.gru = nn.GRU(self.embedding_size, self.hidden_size, batch_first=True)
-        self.repeat_explore_mechanism = Repeat_Explore_Mechanism(self.device,
-                                                                 hidden_size=self.hidden_size,
-                                                                 seq_len=self.max_seq_length,
-                                                                 dropout_prob=self.dropout_prob)
-        self.repeat_recommendation_decoder = Repeat_Recommendation_Decoder(self.device,
-                                                                           hidden_size=self.hidden_size,
-                                                                           seq_len=self.max_seq_length,
-                                                                           num_item=self.n_items,
-                                                                           dropout_prob=self.dropout_prob)
-        self.explore_recommendation_decoder = Explore_Recommendation_Decoder(hidden_size=self.hidden_size,
-                                                                             seq_len=self.max_seq_length,
-                                                                             num_item=self.n_items,
-                                                                             device=self.device,
-                                                                             dropout_prob=self.dropout_prob)
+        self.repeat_explore_mechanism = Repeat_Explore_Mechanism(
+            self.device, hidden_size=self.hidden_size, seq_len=self.max_seq_length, dropout_prob=self.dropout_prob
+        )
+        self.repeat_recommendation_decoder = Repeat_Recommendation_Decoder(
+            self.device,
+            hidden_size=self.hidden_size,
+            seq_len=self.max_seq_length,
+            num_item=self.n_items,
+            dropout_prob=self.dropout_prob
+        )
+        self.explore_recommendation_decoder = Explore_Recommendation_Decoder(
+            hidden_size=self.hidden_size,
+            seq_len=self.max_seq_length,
+            num_item=self.n_items,
+            device=self.device,
+            dropout_prob=self.dropout_prob
+        )
 
         self.loss_fct = F.nll_loss
 
@@ -92,18 +95,15 @@ def forward(self, item_seq, item_seq_len):
         # last_memory: batch_size * hidden_size
         timeline_mask = (item_seq == 0)
 
-        self.repeat_explore = self.repeat_explore_mechanism.forward(all_memory=all_memory,
-                                                                    last_memory=last_memory)
+        self.repeat_explore = self.repeat_explore_mechanism.forward(all_memory=all_memory, last_memory=last_memory)
         # batch_size * 2
-        repeat_recommendation_decoder = self.repeat_recommendation_decoder.forward(all_memory=all_memory,
-                                                                                   last_memory=last_memory,
-                                                                                   item_seq=item_seq,
-                                                                                   mask=timeline_mask)
+        repeat_recommendation_decoder = self.repeat_recommendation_decoder.forward(
+            all_memory=all_memory, last_memory=last_memory, item_seq=item_seq, mask=timeline_mask
+        )
         # batch_size * num_item
-        explore_recommendation_decoder = self.explore_recommendation_decoder.forward(all_memory=all_memory,
-                                                                                     last_memory=last_memory,
-                                                                                     item_seq=item_seq,
-                                                                                     mask=timeline_mask)
+        explore_recommendation_decoder = self.explore_recommendation_decoder.forward(
+            all_memory=all_memory, last_memory=last_memory, item_seq=item_seq, mask=timeline_mask
+        )
         # batch_size * num_item
         prediction = repeat_recommendation_decoder * self.repeat_explore[:, 0].unsqueeze(1) \
                      + explore_recommendation_decoder * self.repeat_explore[:, 1].unsqueeze(1)
@@ -293,20 +293,29 @@ def forward(self, all_memory, last_memory, item_seq, mask=None):
 
 def build_map(b_map, device, max_index=None):
     """
-    project the b_map to the place where it in should be
-    like this:
+    project the b_map to the place where it in should be like this:
         item_seq A: [3,4,5]   n_items: 6
+
         after map: A
+
         [0,0,1,0,0,0]
+
         [0,0,0,1,0,0]
+
         [0,0,0,0,1,0]
 
-    batch_size * seq_len ==>> batch_size * seq_len * n_item
+        batch_size * seq_len ==>> batch_size * seq_len * n_item
 
     use in RepeatNet:
+
     [3,4,5] matmul [0,0,1,0,0,0]
-                   [0,0,0,1,0,0] ==>>>   [0,0,3,4,5,0]  it works in the RepeatNet when project the seq item into all items
+
+                   [0,0,0,1,0,0]
+
                    [0,0,0,0,1,0]
+
+    ==>>> [0,0,3,4,5,0] it works in the RepeatNet when project the seq item into all items
+
     batch_size * 1 * seq_len matmul batch_size * seq_len * n_item ==>> batch_size * 1 * n_item
     """
     batch_size, b_len = b_map.size()
diff --git a/recbole/model/sequential_recommender/s3rec.py b/recbole/model/sequential_recommender/s3rec.py
index 1fcdbf413..1aa86184f 100644
--- a/recbole/model/sequential_recommender/s3rec.py
+++ b/recbole/model/sequential_recommender/s3rec.py
@@ -75,11 +75,16 @@ def __init__(self, config, dataset):
         self.position_embedding = nn.Embedding(self.max_seq_length, self.hidden_size)
         self.feature_embedding = nn.Embedding(self.n_features, self.hidden_size, padding_idx=0)
 
-        self.trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                              hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                              hidden_dropout_prob=self.hidden_dropout_prob,
-                                              attn_dropout_prob=self.attn_dropout_prob,
-                                              hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.LayerNorm = nn.LayerNorm(self.hidden_size, eps=self.layer_norm_eps)
         self.dropout = nn.Dropout(self.hidden_dropout_prob)
@@ -176,14 +181,13 @@ def forward(self, item_seq, bidirectional=True):
         input_emb = self.LayerNorm(input_emb)
         input_emb = self.dropout(input_emb)
         attention_mask = self.get_attention_mask(item_seq, bidirectional=bidirectional)
-        trm_output = self.trm_encoder(input_emb,
-                                      attention_mask,
-                                      output_all_encoded_layers=True)
+        trm_output = self.trm_encoder(input_emb, attention_mask, output_all_encoded_layers=True)
         seq_output = trm_output[-1]  # [B L H]
         return seq_output
 
-    def pretrain(self, features, masked_item_sequence, pos_items, neg_items,
-                 masked_segment_sequence, pos_segment, neg_segment):
+    def pretrain(
+        self, features, masked_item_sequence, pos_items, neg_items, masked_segment_sequence, pos_segment, neg_segment
+    ):
         """Pretrain out model using four pre-training tasks:
 
             1. Associated Attribute Prediction
@@ -317,14 +321,14 @@ def reconstruct_pretrain_data(self, item_seq, item_seq_len):
                 sample_length = random.randint(1, len(instance) // 2)
                 start_id = random.randint(0, len(instance) - sample_length)
                 neg_start_id = random.randint(0, len(long_sequence) - sample_length)
-                pos_segment = instance[start_id: start_id + sample_length]
+                pos_segment = instance[start_id:start_id + sample_length]
                 neg_segment = long_sequence[neg_start_id:neg_start_id + sample_length]
                 masked_segment = instance[:start_id] + [self.mask_token] * sample_length \
                                  + instance[start_id + sample_length:]
-                pos_segment = [self.mask_token] * start_id + pos_segment + [self.mask_token] * (
-                        len(instance) - (start_id + sample_length))
-                neg_segment = [self.mask_token] * start_id + neg_segment + [self.mask_token] * (
-                        len(instance) - (start_id + sample_length))
+                pos_segment = [self.mask_token] * start_id + pos_segment + \
+                              [self.mask_token] * (len(instance) - (start_id + sample_length))
+                neg_segment = [self.mask_token] * start_id + neg_segment + \
+                              [self.mask_token] * (len(instance) - (start_id + sample_length))
             masked_segment_list.append(self._padding_zero_at_left(masked_segment))
             pos_segment_list.append(self._padding_zero_at_left(pos_segment))
             neg_segment_list.append(self._padding_zero_at_left(neg_segment))
@@ -351,8 +355,9 @@ def calculate_loss(self, interaction):
             masked_segment_sequence, pos_segment, neg_segment \
                 = self.reconstruct_pretrain_data(item_seq, item_seq_len)
 
-            loss = self.pretrain(features, masked_item_sequence, pos_items, neg_items,
-                                 masked_segment_sequence, pos_segment, neg_segment)
+            loss = self.pretrain(
+                features, masked_item_sequence, pos_items, neg_items, masked_segment_sequence, pos_segment, neg_segment
+            )
         # finetune
         else:
             pos_items = interaction[self.POS_ITEM_ID]
diff --git a/recbole/model/sequential_recommender/sasrec.py b/recbole/model/sequential_recommender/sasrec.py
index 274f097d7..ea58a8fdf 100644
--- a/recbole/model/sequential_recommender/sasrec.py
+++ b/recbole/model/sequential_recommender/sasrec.py
@@ -52,11 +52,16 @@ def __init__(self, config, dataset):
         # define layers and loss
         self.item_embedding = nn.Embedding(self.n_items, self.hidden_size, padding_idx=0)
         self.position_embedding = nn.Embedding(self.max_seq_length, self.hidden_size)
-        self.trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                              hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                              hidden_dropout_prob=self.hidden_dropout_prob,
-                                              attn_dropout_prob=self.attn_dropout_prob,
-                                              hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.LayerNorm = nn.LayerNorm(self.hidden_size, eps=self.layer_norm_eps)
         self.dropout = nn.Dropout(self.hidden_dropout_prob)
@@ -111,9 +116,7 @@ def forward(self, item_seq, item_seq_len):
 
         extended_attention_mask = self.get_attention_mask(item_seq)
 
-        trm_output = self.trm_encoder(input_emb,
-                                      extended_attention_mask,
-                                      output_all_encoded_layers=True)
+        trm_output = self.trm_encoder(input_emb, extended_attention_mask, output_all_encoded_layers=True)
         output = trm_output[-1]
         output = self.gather_indexes(output, item_seq_len - 1)
         return output  # [B H]
diff --git a/recbole/model/sequential_recommender/sasrecf.py b/recbole/model/sequential_recommender/sasrecf.py
index 71242d026..b6c4feb4f 100644
--- a/recbole/model/sequential_recommender/sasrecf.py
+++ b/recbole/model/sequential_recommender/sasrecf.py
@@ -45,14 +45,20 @@ def __init__(self, config, dataset):
         # define layers and loss
         self.item_embedding = nn.Embedding(self.n_items, self.hidden_size, padding_idx=0)
         self.position_embedding = nn.Embedding(self.max_seq_length, self.hidden_size)
-        self.feature_embed_layer = FeatureSeqEmbLayer(dataset, self.hidden_size, self.selected_features,
-                                                      self.pooling_mode, self.device)
-
-        self.trm_encoder = TransformerEncoder(n_layers=self.n_layers, n_heads=self.n_heads,
-                                              hidden_size=self.hidden_size, inner_size=self.inner_size,
-                                              hidden_dropout_prob=self.hidden_dropout_prob,
-                                              attn_dropout_prob=self.attn_dropout_prob,
-                                              hidden_act=self.hidden_act, layer_norm_eps=self.layer_norm_eps)
+        self.feature_embed_layer = FeatureSeqEmbLayer(
+            dataset, self.hidden_size, self.selected_features, self.pooling_mode, self.device
+        )
+
+        self.trm_encoder = TransformerEncoder(
+            n_layers=self.n_layers,
+            n_heads=self.n_heads,
+            hidden_size=self.hidden_size,
+            inner_size=self.inner_size,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attn_dropout_prob=self.attn_dropout_prob,
+            hidden_act=self.hidden_act,
+            layer_norm_eps=self.layer_norm_eps
+        )
 
         self.concat_layer = nn.Linear(self.hidden_size * (1 + self.num_feature_field), self.hidden_size)
 
@@ -126,8 +132,7 @@ def forward(self, item_seq, item_seq_len):
         input_emb = self.dropout(input_emb)
 
         extended_attention_mask = self.get_attention_mask(item_seq)
-        trm_output = self.trm_encoder(input_emb, extended_attention_mask,
-                                      output_all_encoded_layers=True)
+        trm_output = self.trm_encoder(input_emb, extended_attention_mask, output_all_encoded_layers=True)
         output = trm_output[-1]
         seq_output = self.gather_indexes(output, item_seq_len - 1)
         return seq_output  # [B H]
diff --git a/recbole/model/sequential_recommender/shan.py b/recbole/model/sequential_recommender/shan.py
index b9caf8707..19d7d435d 100644
--- a/recbole/model/sequential_recommender/shan.py
+++ b/recbole/model/sequential_recommender/shan.py
@@ -49,12 +49,22 @@ def __init__(self, config, dataset):
 
         self.long_w = nn.Linear(self.embedding_size, self.embedding_size)
         self.long_b = nn.Parameter(
-            uniform_(tensor=torch.zeros(self.embedding_size), a=-np.sqrt(3 / self.embedding_size),
-                     b=np.sqrt(3 / self.embedding_size)), requires_grad=True).to(self.device)
+            uniform_(
+                tensor=torch.zeros(self.embedding_size),
+                a=-np.sqrt(3 / self.embedding_size),
+                b=np.sqrt(3 / self.embedding_size)
+            ),
+            requires_grad=True
+        ).to(self.device)
         self.long_short_w = nn.Linear(self.embedding_size, self.embedding_size)
         self.long_short_b = nn.Parameter(
-            uniform_(tensor=torch.zeros(self.embedding_size), a=-np.sqrt(3 / self.embedding_size),
-                     b=np.sqrt(3 / self.embedding_size)), requires_grad=True).to(self.device)
+            uniform_(
+                tensor=torch.zeros(self.embedding_size),
+                a=-np.sqrt(3 / self.embedding_size),
+                b=np.sqrt(3 / self.embedding_size)
+            ),
+            requires_grad=True
+        ).to(self.device)
 
         self.relu = nn.ReLU()
 
@@ -112,8 +122,9 @@ def forward(self, seq_item, user, seq_item_len):
 
         # get the mask
         mask = seq_item.data.eq(0)
-        long_term_attention_based_pooling_layer = self.long_term_attention_based_pooling_layer(seq_item_embedding,
-                                                                                               user_embedding, mask)
+        long_term_attention_based_pooling_layer = self.long_term_attention_based_pooling_layer(
+            seq_item_embedding, user_embedding, mask
+        )
         # batch_size * 1 * embedding_size
 
         short_item_embedding = seq_item_embedding[:, -self.short_item_length:, :]
@@ -125,9 +136,9 @@ def forward(self, seq_item, user, seq_item_len):
         long_short_item_embedding = torch.cat([long_term_attention_based_pooling_layer, short_item_embedding], dim=1)
         # batch_size * 1_plus_short_item_length * embedding_size
 
-        long_short_item_embedding = self.long_and_short_term_attention_based_pooling_layer(long_short_item_embedding,
-                                                                                           user_embedding,
-                                                                                           mask_long_short)
+        long_short_item_embedding = self.long_and_short_term_attention_based_pooling_layer(
+            long_short_item_embedding, user_embedding, mask_long_short
+        )
         # batch_size * embedding_size
 
         return long_short_item_embedding
@@ -206,8 +217,8 @@ def long_term_attention_based_pooling_layer(self, seq_item_embedding, user_embed
         if mask is not None:
             user_item_embedding.masked_fill_(mask, -1e9)
         user_item_embedding = nn.Softmax(dim=1)(user_item_embedding)
-        user_item_embedding = torch.mul(seq_item_embedding_value, user_item_embedding.unsqueeze(2)).sum(dim=1,
-                                                                                                        keepdim=True)
+        user_item_embedding = torch.mul(seq_item_embedding_value,
+                                        user_item_embedding.unsqueeze(2)).sum(dim=1, keepdim=True)
         # batch_size * 1 * embedding_size
 
         return user_item_embedding
diff --git a/recbole/model/sequential_recommender/srgnn.py b/recbole/model/sequential_recommender/srgnn.py
index 91d9e6e1a..0147f1499 100644
--- a/recbole/model/sequential_recommender/srgnn.py
+++ b/recbole/model/sequential_recommender/srgnn.py
@@ -3,7 +3,6 @@
 # @Author : Yujie Lu
 # @Email  : yujielu1998@gmail.com
 
-
 r"""
 SRGNN
 ################################################
@@ -63,7 +62,7 @@ def GNNCell(self, A, hidden):
         """
 
         input_in = torch.matmul(A[:, :, :A.size(1)], self.linear_edge_in(hidden)) + self.b_iah
-        input_out = torch.matmul(A[:, :, A.size(1): 2 * A.size(1)], self.linear_edge_out(hidden)) + self.b_ioh
+        input_out = torch.matmul(A[:, :, A.size(1):2 * A.size(1)], self.linear_edge_out(hidden)) + self.b_ioh
         # [batch_size, max_session_len, embedding_size * 2]
         inputs = torch.cat([input_in, input_out], 2)
 
diff --git a/recbole/properties/dataset/ml-100k.yaml b/recbole/properties/dataset/ml-100k.yaml
index 29449c0a5..9e0beff79 100644
--- a/recbole/properties/dataset/ml-100k.yaml
+++ b/recbole/properties/dataset/ml-100k.yaml
@@ -27,16 +27,19 @@ ENTITY_ID_FIELD: entity_id
 load_col:
     inter: [user_id, item_id, rating, timestamp]
 unload_col: ~
+unused_col: ~
 
 # Filtering
-max_user_inter_num: ~
-min_user_inter_num: ~
-max_item_inter_num: ~
-min_item_inter_num: ~
+rm_dup_inter: ~
 lowest_val: ~
 highest_val: ~
 equal_val: ~
 not_equal_val: ~
+filter_inter_by_user_or_item: True
+max_user_inter_num: ~
+min_user_inter_num: ~
+max_item_inter_num: ~
+min_item_inter_num: ~
 
 # Preprocessing
 fields_in_same_space: ~
diff --git a/recbole/properties/dataset/sample.yaml b/recbole/properties/dataset/sample.yaml
index ffbd73159..d9869e5e2 100644
--- a/recbole/properties/dataset/sample.yaml
+++ b/recbole/properties/dataset/sample.yaml
@@ -21,18 +21,20 @@ load_col:
     inter: [user_id, item_id]
     # the others
 unload_col: ~
+unused_col: ~
 additional_feat_suffix: ~
 
 # Filtering
 rm_dup_inter: ~
-max_user_inter_num: ~
-min_user_inter_num: 0
-max_item_inter_num: ~
-min_item_inter_num: 0
 lowest_val: ~
 highest_val: ~
 equal_val: ~
 not_equal_val: ~
+filter_inter_by_user_or_item: True
+max_user_inter_num: ~
+min_user_inter_num: 0
+max_item_inter_num: ~
+min_item_inter_num: 0
 
 # Preprocessing
 fields_in_same_space: ~
diff --git a/recbole/properties/model/EASE.yaml b/recbole/properties/model/EASE.yaml
new file mode 100644
index 000000000..9dfed27e0
--- /dev/null
+++ b/recbole/properties/model/EASE.yaml
@@ -0,0 +1 @@
+reg_weight: 250.0
\ No newline at end of file
diff --git a/recbole/properties/model/NNCF.yaml b/recbole/properties/model/NNCF.yaml
new file mode 100644
index 000000000..c2845ca8e
--- /dev/null
+++ b/recbole/properties/model/NNCF.yaml
@@ -0,0 +1,14 @@
+ui_embedding_size: 64
+neigh_embedding_size: 32
+num_conv_kernel: 128
+conv_kernel_size: 5
+pool_kernel_size: 5
+mlp_hidden_size: [128,64,32,16]
+neigh_num: 20
+dropout: 0.5
+
+# The method to use neighborhood information, you can choose random, knn or louvain algorithom 
+# e.g. neigh_info_method: "knn" or neigh_info_method: "louvain"
+neigh_info_method: "random"
+
+resolution: 1
diff --git a/recbole/properties/model/lightgbm.yaml b/recbole/properties/model/lightgbm.yaml
new file mode 100644
index 000000000..d5cb8ca03
--- /dev/null
+++ b/recbole/properties/model/lightgbm.yaml
@@ -0,0 +1,23 @@
+convert_token_to_onehot: False
+token_num_threshold: 10000
+
+# Dataset
+lgb_silent: False
+
+# Train
+lgb_model: ~
+lgb_params: 
+    boosting: gbdt
+    num_leaves: 90
+    min_data_in_leaf: 30
+    max_depth: -1
+    learning_rate: 0.1
+    objective: binary
+    lambda_l1: 0.1
+    metric: ['auc', 'binary_logloss']
+    force_row_wise: True
+lgb_learning_rates: ~
+lgb_num_boost_round: 300
+lgb_early_stopping_rounds: ~
+lgb_verbose_eval: 100
+
diff --git a/recbole/properties/model/xgboost.yaml b/recbole/properties/model/xgboost.yaml
index 6a6620bf1..47d4f890f 100644
--- a/recbole/properties/model/xgboost.yaml
+++ b/recbole/properties/model/xgboost.yaml
@@ -1,14 +1,8 @@
-# Type of training method
 convert_token_to_onehot: False
 token_num_threshold: 10000
 
 # DMatrix
-xgb_weight: ~
-xgb_base_margin: ~
-xgb_missing: ~
 xgb_silent: ~
-xgb_feature_names: ~
-xgb_feature_types: ~
 xgb_nthread: ~
 
 xgb_model: ~
@@ -26,11 +20,6 @@ xgb_params:
     seed: 2020
     # nthread: -1
 xgb_num_boost_round: 500
-# xgb_evals: ~
-xgb_obj: ~
-xgb_feval: ~
-xgb_maximize: ~
 xgb_early_stopping_rounds: ~
-# xgb_evals_result: ~
 xgb_verbose_eval: 100
 
diff --git a/recbole/properties/overall.yaml b/recbole/properties/overall.yaml
index 696e13609..4c4ea4464 100644
--- a/recbole/properties/overall.yaml
+++ b/recbole/properties/overall.yaml
@@ -19,13 +19,15 @@ eval_step: 1
 stopping_step: 10
 clip_grad_norm: ~
 # clip_grad_norm:  {'max_norm': 5, 'norm_type': 2}
+weight_decay: 0.0
+draw_pic: False
 
 # evaluation settings
 eval_setting: RO_RS,full
 group_by_user: True
 split_ratio: [0.8,0.1,0.1]
 leave_one_num: 2
-real_time_process: True
+real_time_process: False
 metrics: ["Recall", "MRR","NDCG","Hit","Precision"]
 topk: [10]
 valid_metric: MRR@10
diff --git a/recbole/quick_start/quick_start.py b/recbole/quick_start/quick_start.py
index 6fd49ab4a..66aae1c1c 100644
--- a/recbole/quick_start/quick_start.py
+++ b/recbole/quick_start/quick_start.py
@@ -50,8 +50,9 @@ def run_recbole(model=None, dataset=None, config_file_list=None, config_dict=Non
     trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
 
     # model training
-    best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, saved=saved,
-                                                      show_progress=config['show_progress'])
+    best_valid_score, best_valid_result = trainer.fit(
+        train_data, valid_data, saved=saved, show_progress=config['show_progress']
+    )
 
     # model evaluation
     test_result = trainer.evaluate(test_data, load_best_model=saved, show_progress=config['show_progress'])
diff --git a/recbole/sampler/sampler.py b/recbole/sampler/sampler.py
index 76deec7cd..e0e1a0d9b 100644
--- a/recbole/sampler/sampler.py
+++ b/recbole/sampler/sampler.py
@@ -44,7 +44,7 @@ def __init__(self, distribution):
 
     def set_distribution(self, distribution):
         """Set the distribution of sampler.
-            
+
         Args:
             distribution (str): Distribution of the negative items.
         """
@@ -91,7 +91,7 @@ def random_num(self, num):
         self.random_pr %= self.random_list_length
         while True:
             if self.random_pr + num <= self.random_list_length:
-                value_id.append(self.random_list[self.random_pr: self.random_pr + num])
+                value_id.append(self.random_list[self.random_pr:self.random_pr + num])
                 self.random_pr += num
                 break
             else:
@@ -147,9 +147,10 @@ def sample_by_key_ids(self, key_ids, num):
             key_ids = np.tile(key_ids, num)
             while len(check_list) > 0:
                 value_ids[check_list] = self.random_num(len(check_list))
-                check_list = np.array([i for i, used, v in
-                                       zip(check_list, self.used_ids[key_ids[check_list]], value_ids[check_list])
-                                       if v in used])
+                check_list = np.array([
+                    i for i, used, v in zip(check_list, self.used_ids[key_ids[check_list]], value_ids[check_list])
+                    if v in used
+                ])
         return torch.tensor(value_ids)
 
 
@@ -174,7 +175,7 @@ def __init__(self, phases, datasets, distribution='uniform'):
         if not isinstance(datasets, list):
             datasets = [datasets]
         if len(phases) != len(datasets):
-            raise ValueError('phases {} and datasets {} should have the same length'.format(phases, datasets))
+            raise ValueError(f'Phases {phases} and datasets {datasets} should have the same length.')
 
         self.phases = phases
         self.datasets = datasets
@@ -200,7 +201,7 @@ def get_random_list(self):
                 random_item_list.extend(dataset.inter_feat[self.iid_field].numpy())
             return random_item_list
         else:
-            raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+            raise NotImplementedError(f'Distribution [{self.distribution}] has not been implemented.')
 
     def get_used_ids(self):
         """
@@ -218,9 +219,11 @@ def get_used_ids(self):
 
         for used_item_set in used_item_id[self.phases[-1]]:
             if len(used_item_set) + 1 == self.n_items:  # [pad] is a item.
-                raise ValueError('Some users have interacted with all items, '
-                                 'which we can not sample negative items for them. '
-                                 'Please set `max_user_inter_num` to filter those users.')
+                raise ValueError(
+                    'Some users have interacted with all items, '
+                    'which we can not sample negative items for them. '
+                    'Please set `max_user_inter_num` to filter those users.'
+                )
         return used_item_id
 
     def set_phase(self, phase):
@@ -234,7 +237,7 @@ def set_phase(self, phase):
             is set to the value of corresponding phase.
         """
         if phase not in self.phases:
-            raise ValueError('phase [{}] not exist'.format(phase))
+            raise ValueError(f'Phase [{phase}] not exist.')
         new_sampler = copy.copy(self)
         new_sampler.phase = phase
         new_sampler.used_ids = new_sampler.used_ids[phase]
@@ -259,7 +262,7 @@ def sample_by_user_ids(self, user_ids, num):
         except IndexError:
             for user_id in user_ids:
                 if user_id < 0 or user_id >= self.n_users:
-                    raise ValueError('user_id [{}] not exist'.format(user_id))
+                    raise ValueError(f'user_id [{user_id}] not exist.')
 
 
 class KGSampler(AbstractSampler):
@@ -293,7 +296,7 @@ def get_random_list(self):
         elif self.distribution == 'popularity':
             return list(self.hid_list) + list(self.tid_list)
         else:
-            raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+            raise NotImplementedError(f'Distribution [{self.distribution}] has not been implemented.')
 
     def get_used_ids(self):
         """
@@ -307,8 +310,10 @@ def get_used_ids(self):
 
         for used_tail_set in used_tail_entity_id:
             if len(used_tail_set) + 1 == self.entity_num:  # [pad] is a entity.
-                raise ValueError('Some head entities have relation with all entities, '
-                                 'which we can not sample negative entities for them.')
+                raise ValueError(
+                    'Some head entities have relation with all entities, '
+                    'which we can not sample negative entities for them.'
+                )
         return used_tail_entity_id
 
     def sample_by_entity_ids(self, head_entity_ids, num=1):
@@ -330,7 +335,7 @@ def sample_by_entity_ids(self, head_entity_ids, num=1):
         except IndexError:
             for head_entity_id in head_entity_ids:
                 if head_entity_id not in self.head_entities:
-                    raise ValueError('head_entity_id [{}] not exist'.format(head_entity_id))
+                    raise ValueError(f'head_entity_id [{head_entity_id}] not exist.')
 
 
 class RepeatableSampler(AbstractSampler):
@@ -368,7 +373,7 @@ def get_random_list(self):
         elif self.distribution == 'popularity':
             return self.dataset.inter_feat[self.iid_field].numpy()
         else:
-            raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+            raise NotImplementedError(f'Distribution [{self.distribution}] has not been implemented.')
 
     def get_used_ids(self):
         """
@@ -397,7 +402,7 @@ def sample_by_user_ids(self, user_ids, num):
         except IndexError:
             for user_id in user_ids:
                 if user_id < 0 or user_id >= self.n_users:
-                    raise ValueError('user_id [{}] not exist'.format(user_id))
+                    raise ValueError(f'user_id [{user_id}] not exist.')
 
     def set_phase(self, phase):
         """Get the sampler of corresponding phase.
@@ -409,7 +414,7 @@ def set_phase(self, phase):
             Sampler: the copy of this sampler, and :attr:`phase` is set the same as input phase.
         """
         if phase not in self.phases:
-            raise ValueError('phase [{}] not exist'.format(phase))
+            raise ValueError(f'Phase [{phase}] not exist.')
         new_sampler = copy.copy(self)
         new_sampler.phase = phase
         return new_sampler
diff --git a/recbole/trainer/hyper_tuning.py b/recbole/trainer/hyper_tuning.py
index a6788fbf1..dda0c49e5 100644
--- a/recbole/trainer/hyper_tuning.py
+++ b/recbole/trainer/hyper_tuning.py
@@ -75,8 +75,10 @@ def _validate_space_exhaustive_search(space):
     for node in dfs(as_apply(space)):
         if node.name in implicit_stochastic_symbols:
             if node.name not in supported_stochastic_symbols:
-                raise ExhaustiveSearchError('Exhaustive search is only possible with the following stochastic symbols: '
-                                            '' + ', '.join(supported_stochastic_symbols))
+                raise ExhaustiveSearchError(
+                    'Exhaustive search is only possible with the following stochastic symbols: '
+                    '' + ', '.join(supported_stochastic_symbols)
+                )
 
 
 def exhaustive_search(new_ids, domain, trials, seed, nbMaxSucessiveFailures=1000):
@@ -86,8 +88,12 @@ def exhaustive_search(new_ids, domain, trials, seed, nbMaxSucessiveFailures=1000
     from hyperopt import pyll
     from hyperopt.base import miscs_update_idxs_vals
     # Build a hash set for previous trials
-    hashset = set([hash(frozenset([(key, value[0]) if len(value) > 0 else ((key, None))
-                                   for key, value in trial['misc']['vals'].items()])) for trial in trials.trials])
+    hashset = set([
+        hash(
+            frozenset([(key, value[0]) if len(value) > 0 else ((key, None))
+                       for key, value in trial['misc']['vals'].items()])
+        ) for trial in trials.trials
+    ])
 
     rng = np.random.RandomState(seed)
     rval = []
@@ -96,19 +102,16 @@ def exhaustive_search(new_ids, domain, trials, seed, nbMaxSucessiveFailures=1000
         nbSucessiveFailures = 0
         while not newSample:
             # -- sample new specs, idxs, vals
-            idxs, vals = pyll.rec_eval(
-                domain.s_idxs_vals,
-                memo={
-                    domain.s_new_ids: [new_id],
-                    domain.s_rng: rng,
-                })
+            idxs, vals = pyll.rec_eval(domain.s_idxs_vals, memo={
+                domain.s_new_ids: [new_id],
+                domain.s_rng: rng,
+            })
             new_result = domain.new_result()
             new_misc = dict(tid=new_id, cmd=domain.cmd, workdir=domain.workdir)
             miscs_update_idxs_vals([new_misc], idxs, vals)
 
             # Compare with previous hashes
-            h = hash(frozenset([(key, value[0]) if len(value) > 0 else (
-                (key, None)) for key, value in vals.items()]))
+            h = hash(frozenset([(key, value[0]) if len(value) > 0 else ((key, None)) for key, value in vals.items()]))
             if h not in hashset:
                 newSample = True
             else:
@@ -119,8 +122,7 @@ def exhaustive_search(new_ids, domain, trials, seed, nbMaxSucessiveFailures=1000
                 # No more samples to produce
                 return []
 
-        rval.extend(trials.new_trial_docs([new_id],
-                                          [None], [new_result], [new_misc]))
+        rval.extend(trials.new_trial_docs([new_id], [None], [new_result], [new_misc]))
     return rval
 
 
@@ -136,8 +138,16 @@ class HyperTuning(object):
         https://github.com/hyperopt/hyperopt/issues/200
     """
 
-    def __init__(self, objective_function, space=None, params_file=None, params_dict=None, fixed_config_file_list=None,
-                 algo='exhaustive', max_evals=100):
+    def __init__(
+        self,
+        objective_function,
+        space=None,
+        params_file=None,
+        params_dict=None,
+        fixed_config_file_list=None,
+        algo='exhaustive',
+        max_evals=100
+    ):
         self.best_score = None
         self.best_params = None
         self.best_test_result = None
@@ -288,7 +298,7 @@ def trial(self, params):
                     self._print_result(result_dict)
 
         if bigger:
-            score = - score
+            score = -score
         return {'loss': score, 'status': hyperopt.STATUS_OK}
 
     def run(self):
diff --git a/recbole/trainer/trainer.py b/recbole/trainer/trainer.py
index bfd761660..9f9bfd9b5 100644
--- a/recbole/trainer/trainer.py
+++ b/recbole/trainer/trainer.py
@@ -3,9 +3,9 @@
 # @Email  : slmu@ruc.edu.cn
 
 # UPDATE:
-# @Time   : 2020/8/7, 2020/9/26, 2020/9/26, 2020/10/01, 2020/9/16, 2020/10/8, 2020/10/15, 2020/11/20
-# @Author : Zihan Lin, Yupeng Hou, Yushuo Chen, Shanlei Mu, Xingyu Pan, Hui Wang, Xinyan Fan, Chen Yang
-# @Email  : linzihan.super@foxmail.com, houyupeng@ruc.edu.cn, chenyushuo@ruc.edu.cn, slmu@ruc.edu.cn, panxy@ruc.edu.cn, hui.wang@ruc.edu.cn, xinyan.fan@ruc.edu.cn, 254170321@qq.com
+# @Time   : 2020/8/7, 2020/9/26, 2020/9/26, 2020/10/01, 2020/9/16, 2020/10/8, 2020/10/15, 2020/11/20, 2021/2/20
+# @Author : Zihan Lin, Yupeng Hou, Yushuo Chen, Shanlei Mu, Xingyu Pan, Hui Wang, Xinyan Fan, Chen Yang, Yibo Li
+# @Email  : linzihan.super@foxmail.com, houyupeng@ruc.edu.cn, chenyushuo@ruc.edu.cn, slmu@ruc.edu.cn, panxy@ruc.edu.cn, hui.wang@ruc.edu.cn, xinyan.fan@ruc.edu.cn, 254170321@qq.com, 2018202152@ruc.edu.cn
 
 r"""
 recbole.trainer.trainer
@@ -16,7 +16,6 @@
 from logging import getLogger
 from time import time
 
-import matplotlib.pyplot as plt
 import numpy as np
 import torch
 import torch.optim as optim
@@ -64,7 +63,7 @@ class Trainer(AbstractTrainer):
 
     Initializing the Trainer needs two parameters: `config` and `model`. `config` records the parameters information
     for controlling training and evaluation, such as `learning_rate`, `epochs`, `eval_step` and so on.
-    More information can be found in [placeholder]. `model` is the instantiated object of a Model Class.
+    `model` is the instantiated object of a Model Class.
 
     """
 
@@ -86,6 +85,8 @@ def __init__(self, config, model):
         ensure_dir(self.checkpoint_dir)
         saved_model_file = '{}-{}.pth'.format(self.config['model'], get_local_time())
         self.saved_model_file = os.path.join(self.checkpoint_dir, saved_model_file)
+        self.weight_decay = config['weight_decay']
+        self.draw_pic = config['draw_pic']
 
         self.start_epoch = 0
         self.cur_step = 0
@@ -105,15 +106,17 @@ def _build_optimizer(self):
             torch.optim: the optimizer
         """
         if self.learner.lower() == 'adam':
-            optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)
+            optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
         elif self.learner.lower() == 'sgd':
-            optimizer = optim.SGD(self.model.parameters(), lr=self.learning_rate)
+            optimizer = optim.SGD(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
         elif self.learner.lower() == 'adagrad':
-            optimizer = optim.Adagrad(self.model.parameters(), lr=self.learning_rate)
+            optimizer = optim.Adagrad(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
         elif self.learner.lower() == 'rmsprop':
-            optimizer = optim.RMSprop(self.model.parameters(), lr=self.learning_rate)
+            optimizer = optim.RMSprop(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)
         elif self.learner.lower() == 'sparse_adam':
             optimizer = optim.SparseAdam(self.model.parameters(), lr=self.learning_rate)
+            if self.weight_decay > 0:
+                self.logger.warning('Sparse Adam cannot argument received argument [{weight_decay}]')
         else:
             self.logger.warning('Received unrecognized optimizer, set default Adam optimizer')
             optimizer = optim.Adam(self.model.parameters(), lr=self.learning_rate)
@@ -127,7 +130,7 @@ def _train_epoch(self, train_data, epoch_idx, loss_func=None, show_progress=Fals
             epoch_idx (int): The current epoch id.
             loss_func (function): The loss function of :attr:`model`. If it is ``None``, the loss function will be
                 :attr:`self.model.calculate_loss`. Defaults to ``None``.
-            show_progress (bool): Show progress of epoch training. Defaults to ``False``.
+            show_progress (bool): Show the progress of training epoch. Defaults to ``False``.
 
         Returns:
             float/tuple: The sum of loss returned by all batches in this epoch. If the loss in each batch contains
@@ -142,9 +145,7 @@ def _train_epoch(self, train_data, epoch_idx, loss_func=None, show_progress=Fals
                 enumerate(train_data),
                 total=len(train_data),
                 desc=f"Train {epoch_idx:>5}",
-            )
-            if show_progress
-            else enumerate(train_data)
+            ) if show_progress else enumerate(train_data)
         )
         for batch_idx, interaction in iter_data:
             interaction = interaction.to(self.device)
@@ -169,7 +170,7 @@ def _valid_epoch(self, valid_data, show_progress=False):
 
         Args:
             valid_data (DataLoader): the valid data.
-            show_progress (bool): Show progress of epoch evaluate. Defaults to ``False``.
+            show_progress (bool): Show the progress of evaluate epoch. Defaults to ``False``.
 
         Returns:
             float: valid score
@@ -211,8 +212,10 @@ def resume_checkpoint(self, resume_file):
 
         # load architecture params from checkpoint
         if checkpoint['config']['model'].lower() != self.config['model'].lower():
-            self.logger.warning('Architecture configuration given in config file is different from that of checkpoint. '
-                                'This may yield an exception while state_dict is being loaded.')
+            self.logger.warning(
+                'Architecture configuration given in config file is different from that of checkpoint. '
+                'This may yield an exception while state_dict is being loaded.'
+            )
         self.model.load_state_dict(checkpoint['state_dict'])
 
         # load optimizer state from checkpoint only when optimizer type is not changed
@@ -244,7 +247,7 @@ def fit(self, train_data, valid_data=None, verbose=True, saved=True, show_progre
                                                If it's None, the early_stopping is invalid.
             verbose (bool, optional): whether to write training and evaluation information to logger, default: True
             saved (bool, optional): whether to save the model parameters, default: True
-            show_progress (bool): Show progress of epoch training and evaluate. Defaults to ``False``.
+            show_progress (bool): Show the progress of training epoch and evaluate epoch. Defaults to ``False``.
             callback_fn (callable): Optional callback function executed at end of epoch.
                                     Includes (epoch_idx, valid_score) input arguments.
 
@@ -277,8 +280,12 @@ def fit(self, train_data, valid_data=None, verbose=True, saved=True, show_progre
                 valid_start_time = time()
                 valid_score, valid_result = self._valid_epoch(valid_data, show_progress=show_progress)
                 self.best_valid_score, self.cur_step, stop_flag, update_flag = early_stopping(
-                    valid_score, self.best_valid_score, self.cur_step,
-                    max_step=self.stopping_step, bigger=self.valid_metric_bigger)
+                    valid_score,
+                    self.best_valid_score,
+                    self.cur_step,
+                    max_step=self.stopping_step,
+                    bigger=self.valid_metric_bigger
+                )
                 valid_end_time = time()
                 valid_score_output = "epoch %d evaluating [time: %.2fs, valid_score: %f]" % \
                                      (epoch_idx, valid_end_time - valid_start_time, valid_score)
@@ -303,6 +310,8 @@ def fit(self, train_data, valid_data=None, verbose=True, saved=True, show_progre
                     if verbose:
                         self.logger.info(stop_output)
                     break
+        if self.draw_pic:
+            self.plot_train_loss(save_path=self.config['model']+'_train_loss_graph.pdf')
         return self.best_valid_score, self.best_valid_result
 
     def _full_sort_batch_eval(self, batched_data):
@@ -341,10 +350,10 @@ def evaluate(self, eval_data, load_best_model=True, model_file=None, show_progre
                                               It should be set True, if users want to test the model after training.
             model_file (str, optional): the saved model file, default: None. If users want to test the previously
                                         trained model file, they can set this parameter.
-            show_progress (bool): Show progress of epoch evaluate. Defaults to ``False``.
+            show_progress (bool): Show the progress of evaluate epoch. Defaults to ``False``.
 
         Returns:
-            dict: eval result, key is the eval metric and value in the corresponding metric value
+            dict: eval result, key is the eval metric and value in the corresponding metric value.
         """
         if not eval_data:
             return
@@ -372,9 +381,7 @@ def evaluate(self, eval_data, load_best_model=True, model_file=None, show_progre
                 enumerate(eval_data),
                 total=len(eval_data),
                 desc=f"Evaluate   ",
-            )
-            if show_progress
-            else enumerate(eval_data)
+            ) if show_progress else enumerate(eval_data)
         )
         for batch_idx, batched_data in iter_data:
             if eval_data.dl_type == DataLoaderType.FULL:
@@ -413,17 +420,21 @@ def plot_train_loss(self, show=True, save_path=None):
         r"""Plot the train loss in each epoch
 
         Args:
-            show (bool, optional): whether to show this figure, default: True
-            save_path (str, optional): the data path to save the figure, default: None.
+            show (bool, optional): Whether to show this figure, default: True
+            save_path (str, optional): The data path to save the figure, default: None.
                                        If it's None, it will not be saved.
         """
+        import matplotlib.pyplot as plt
+        import time
         epochs = list(self.train_loss_dict.keys())
         epochs.sort()
         values = [float(self.train_loss_dict[epoch]) for epoch in epochs]
         plt.plot(epochs, values)
-        plt.xticks(epochs)
+        my_x_ticks = np.arange(0,len(epochs),int(len(epochs)/10))
+        plt.xticks(my_x_ticks)
         plt.xlabel('Epoch')
         plt.ylabel('Loss')
+        plt.title(self.config['model']+' '+time.strftime("%Y-%m-%d %H:%M", time.localtime(time.time())))
         if show:
             plt.show()
         if save_path:
@@ -453,9 +464,9 @@ def _train_epoch(self, train_data, epoch_idx, loss_func=None, show_progress=Fals
         if interaction_state in [KGDataLoaderState.RSKG, KGDataLoaderState.RS]:
             return super()._train_epoch(train_data, epoch_idx, show_progress=show_progress)
         elif interaction_state in [KGDataLoaderState.KG]:
-            return super()._train_epoch(train_data, epoch_idx,
-                                        loss_func=self.model.calculate_kg_loss,
-                                        show_progress=show_progress)
+            return super()._train_epoch(
+                train_data, epoch_idx, loss_func=self.model.calculate_kg_loss, show_progress=show_progress
+            )
         return None
 
 
@@ -474,9 +485,9 @@ def _train_epoch(self, train_data, epoch_idx, loss_func=None, show_progress=Fals
 
         # train kg
         train_data.set_mode(KGDataLoaderState.KG)
-        kg_total_loss = super()._train_epoch(train_data, epoch_idx,
-                                             loss_func=self.model.calculate_kg_loss,
-                                             show_progress=show_progress)
+        kg_total_loss = super()._train_epoch(
+            train_data, epoch_idx, loss_func=self.model.calculate_kg_loss, show_progress=show_progress
+        )
 
         # update A
         self.model.eval()
@@ -525,9 +536,10 @@ def pretrain(self, train_data, verbose=True, show_progress=False):
                 self.logger.info(train_loss_output)
 
             if (epoch_idx + 1) % self.config['save_step'] == 0:
-                saved_model_file = os.path.join(self.checkpoint_dir,
-                                                '{}-{}-{}.pth'.format(self.config['model'], self.config['dataset'],
-                                                                      str(epoch_idx + 1)))
+                saved_model_file = os.path.join(
+                    self.checkpoint_dir,
+                    '{}-{}-{}.pth'.format(self.config['model'], self.config['dataset'], str(epoch_idx + 1))
+                )
                 self.save_pretrained_model(epoch_idx, saved_model_file)
                 update_output = 'Saving current: %s' % saved_model_file
                 if verbose:
@@ -559,17 +571,17 @@ def _train_epoch(self, train_data, epoch_idx, loss_func=None, show_progress=Fals
         # train rs
         self.logger.info('Train RS')
         train_data.set_mode(KGDataLoaderState.RS)
-        rs_total_loss = super()._train_epoch(train_data, epoch_idx,
-                                             loss_func=self.model.calculate_rs_loss,
-                                             show_progress=show_progress)
+        rs_total_loss = super()._train_epoch(
+            train_data, epoch_idx, loss_func=self.model.calculate_rs_loss, show_progress=show_progress
+        )
 
         # train kg
         if epoch_idx % self.kge_interval == 0:
             self.logger.info('Train KG')
             train_data.set_mode(KGDataLoaderState.KG)
-            kg_total_loss = super()._train_epoch(train_data, epoch_idx,
-                                                 loss_func=self.model.calculate_kg_loss,
-                                                 show_progress=show_progress)
+            kg_total_loss = super()._train_epoch(
+                train_data, epoch_idx, loss_func=self.model.calculate_kg_loss, show_progress=show_progress
+            )
 
         return rs_total_loss, kg_total_loss
 
@@ -584,42 +596,17 @@ def __init__(self, config, model):
         self.epochs = 1  # Set the epoch to 1 when running memory based model
 
 
-class xgboostTrainer(AbstractTrainer):
-    """xgboostTrainer is designed for XGBOOST.
-    
-    """
+class DecisionTreeTrainer(AbstractTrainer):
+    """DecisionTreeTrainer is designed for DecisionTree model.
 
+    """
     def __init__(self, config, model):
-        super(xgboostTrainer, self).__init__(config, model)
-
-        self.xgb = __import__('xgboost')
+        super(DecisionTreeTrainer, self).__init__(config, model)
 
         self.logger = getLogger()
         self.label_field = config['LABEL_FIELD']
-        self.xgb_model = config['xgb_model']
         self.convert_token_to_onehot = self.config['convert_token_to_onehot']
 
-        # DMatrix params
-        self.weight = config['xgb_weight']
-        self.base_margin = config['xgb_base_margin']
-        self.missing = config['xgb_missing']
-        self.silent = config['xgb_silent']
-        self.feature_names = config['xgb_feature_names']
-        self.feature_types = config['xgb_feature_types']
-        self.nthread = config['xgb_nthread']
-
-        # train params
-        self.params = config['xgb_params']
-        self.num_boost_round = config['xgb_num_boost_round']
-        self.evals = ()
-        self.obj = config['xgb_obj']
-        self.feval = config['xgb_feval']
-        self.maximize = config['xgb_maximize']
-        self.early_stopping_rounds = config['xgb_early_stopping_rounds']
-        self.evals_result = {}
-        self.verbose_eval = config['xgb_verbose_eval']
-        self.callbacks = None
-
         # evaluator
         self.eval_type = config['eval_type']
         self.epochs = config['epochs']
@@ -634,13 +621,14 @@ def __init__(self, config, model):
         saved_model_file = '{}-{}.pth'.format(self.config['model'], get_local_time())
         self.saved_model_file = os.path.join(self.checkpoint_dir, saved_model_file)
 
-    def _interaction_to_DMatrix(self, dataloader):
-        r"""Convert data format from interaction to DMatrix
+    def _interaction_to_sparse(self, dataloader):
+        r"""Convert data format from interaction to sparse or numpy
 
         Args:
-            dataloader (XgboostDataLoader): xgboost dataloader.
+            dataloader (DecisionTreeDataLoader): DecisionTreeDataLoader dataloader.
         Returns:
-            DMatrix: Data in the form of 'DMatrix'.
+            cur_data (sparse or numpy): data.
+            interaction_np[self.label_field] (numpy): label.
         """
         interaction = dataloader.dataset[:]
         interaction_np = interaction.numpy()
@@ -682,39 +670,16 @@ def _interaction_to_DMatrix(self, dataloader):
 
             cur_data = sparse.csc_matrix(onehot_data)
 
-        return self.xgb.DMatrix(data=cur_data,
-                                label=interaction_np[self.label_field],
-                                weight=self.weight,
-                                base_margin=self.base_margin,
-                                missing=self.missing,
-                                silent=self.silent,
-                                feature_names=self.feature_names,
-                                feature_types=self.feature_types,
-                                nthread=self.nthread)
-
-    def _train_at_once(self, train_data, valid_data):
-        r"""
-        
-        Args:
-            train_data (XgboostDataLoader): XgboostDataLoader, which is the same with GeneralDataLoader.
-            valid_data (XgboostDataLoader): XgboostDataLoader, which is the same with GeneralDataLoader.
-        """
-        self.dtrain = self._interaction_to_DMatrix(train_data)
-        self.dvalid = self._interaction_to_DMatrix(valid_data)
-        self.evals = [(self.dtrain, 'train'), (self.dvalid, 'valid')]
-        self.model = self.xgb.train(self.params, self.dtrain, self.num_boost_round,
-                                    self.evals, self.obj, self.feval, self.maximize,
-                                    self.early_stopping_rounds, self.evals_result,
-                                    self.verbose_eval, self.xgb_model, self.callbacks)
+        return cur_data, interaction_np[self.label_field]
 
-        self.model.save_model(self.saved_model_file)
-        self.xgb_model = self.saved_model_file
+    def _interaction_to_lib_datatype(self, dataloader):
+        pass
 
     def _valid_epoch(self, valid_data):
         r"""
-        
+
         Args:
-            valid_data (XgboostDataLoader): XgboostDataLoader, which is the same with GeneralDataLoader.
+            valid_data (DecisionTreeDataLoader): DecisionTreeDataLoader, which is the same with GeneralDataLoader.
         """
         valid_result = self.evaluate(valid_data)
         valid_score = calculate_valid_score(valid_result, self.valid_metric)
@@ -722,8 +687,8 @@ def _valid_epoch(self, valid_data):
 
     def fit(self, train_data, valid_data=None, verbose=True, saved=True, show_progress=False):
         # load model
-        if self.xgb_model is not None:
-            self.model.load_model(self.xgb_model)
+        if self.boost_model is not None:
+            self.model.load_model(self.boost_model)
 
         self.best_valid_score = 0.
         self.best_valid_result = 0.
@@ -748,14 +713,138 @@ def fit(self, train_data, valid_data=None, verbose=True, saved=True, show_progre
 
         return self.best_valid_score, self.best_valid_result
 
+    def evaluate(self):
+        pass
+
+
+class xgboostTrainer(DecisionTreeTrainer):
+    """xgboostTrainer is designed for XGBOOST.
+
+    """
+
+    def __init__(self, config, model):
+        super(xgboostTrainer, self).__init__(config, model)
+
+        self.xgb = __import__('xgboost')
+        self.boost_model = config['xgb_model']
+        self.silent = config['xgb_silent']
+        self.nthread = config['xgb_nthread']
+
+        # train params
+        self.params = config['xgb_params']
+        self.num_boost_round = config['xgb_num_boost_round']
+        self.evals = ()
+        self.early_stopping_rounds = config['xgb_early_stopping_rounds']
+        self.evals_result = {}
+        self.verbose_eval = config['xgb_verbose_eval']
+        self.callbacks = None
+
+    def _interaction_to_lib_datatype(self, dataloader):
+        r"""Convert data format from interaction to DMatrix
+
+        Args:
+            dataloader (DecisionTreeDataLoader): xgboost dataloader.
+        Returns:
+            DMatrix: Data in the form of 'DMatrix'.
+        """
+        data, label = self._interaction_to_sparse(dataloader)
+        return self.xgb.DMatrix(data = data, label = label, silent = self.silent, nthread = self.nthread)
+
+    def _train_at_once(self, train_data, valid_data):
+        r"""
+
+        Args:
+            train_data (DecisionTreeDataLoader): DecisionTreeDataLoader, which is the same with GeneralDataLoader.
+            valid_data (DecisionTreeDataLoader): DecisionTreeDataLoader, which is the same with GeneralDataLoader.
+        """
+        self.dtrain = self._interaction_to_lib_datatype(train_data)
+        self.dvalid = self._interaction_to_lib_datatype(valid_data)
+        self.evals = [(self.dtrain, 'train'), (self.dvalid, 'valid')]
+        self.model = self.xgb.train(self.params, self.dtrain, self.num_boost_round, self.evals,
+                                    early_stopping_rounds = self.early_stopping_rounds,
+                                    evals_result = self.evals_result,
+                                    verbose_eval = self.verbose_eval,
+                                    xgb_model = self.boost_model,
+                                    callbacks = self.callbacks)
+
+        self.model.save_model(self.saved_model_file)
+        self.boost_model = self.saved_model_file
+
     def evaluate(self, eval_data, load_best_model=True, model_file=None, show_progress=False):
         self.eval_pred = torch.Tensor()
         self.eval_true = torch.Tensor()
 
-        self.deval = self._interaction_to_DMatrix(eval_data)
+        self.deval = self._interaction_to_lib_datatype(eval_data)
         self.eval_true = torch.Tensor(self.deval.get_label())
         self.eval_pred = torch.Tensor(self.model.predict(self.deval))
 
         batch_matrix_list = [[torch.stack((self.eval_true, self.eval_pred), 1)]]
         result = self.evaluator.evaluate(batch_matrix_list, eval_data)
         return result
+
+
+class lightgbmTrainer(DecisionTreeTrainer):
+    """lightgbmTrainer is designed for lightgbm.
+
+    """
+
+    def __init__(self, config, model):
+        super(lightgbmTrainer, self).__init__(config, model)
+
+        self.lgb = __import__('lightgbm')
+        self.boost_model = config['lgb_model']
+        self.silent = config['lgb_silent']
+
+        # train params
+        self.params = config['lgb_params']
+        self.num_boost_round = config['lgb_num_boost_round']
+        self.evals = ()
+        self.early_stopping_rounds = config['lgb_early_stopping_rounds']
+        self.evals_result = {}
+        self.verbose_eval = config['lgb_verbose_eval']
+        self.learning_rates = config['lgb_learning_rates']
+        self.callbacks = None
+
+    def _interaction_to_lib_datatype(self, dataloader):
+        r"""Convert data format from interaction to Dataset
+
+        Args:
+            dataloader (DecisionTreeDataLoader): xgboost dataloader.
+        Returns:
+            dataset(lgb.Dataset): Data in the form of 'lgb.Dataset'.
+        """
+        data, label = self._interaction_to_sparse(dataloader)
+        return self.lgb.Dataset(data = data, label = label, silent = self.silent)
+
+    def _train_at_once(self, train_data, valid_data):
+        r"""
+
+        Args:
+            train_data (DecisionTreeDataLoader): DecisionTreeDataLoader, which is the same with GeneralDataLoader.
+            valid_data (DecisionTreeDataLoader): DecisionTreeDataLoader, which is the same with GeneralDataLoader.
+        """
+        self.dtrain = self._interaction_to_lib_datatype(train_data)
+        self.dvalid = self._interaction_to_lib_datatype(valid_data)
+        self.evals = [self.dtrain, self.dvalid]
+        self.model = self.lgb.train(self.params, self.dtrain, self.num_boost_round, self.evals,
+                                    early_stopping_rounds = self.early_stopping_rounds,
+                                    evals_result = self.evals_result,
+                                    verbose_eval = self.verbose_eval,
+                                    learning_rates = self.learning_rates,
+                                    init_model = self.boost_model,
+                                    callbacks = self.callbacks)
+
+        self.model.save_model(self.saved_model_file)
+        self.boost_model = self.saved_model_file
+
+    def evaluate(self, eval_data, load_best_model=True, model_file=None, show_progress=False):
+        self.eval_pred = torch.Tensor()
+        self.eval_true = torch.Tensor()
+
+        self.deval_data, self.deval_label = self._interaction_to_sparse(eval_data)
+        self.eval_true = torch.Tensor(self.deval_label)
+        self.eval_pred = torch.Tensor(self.model.predict(self.deval_data))
+
+        batch_matrix_list = [[torch.stack((self.eval_true, self.eval_pred), 1)]]
+        result = self.evaluator.evaluate(batch_matrix_list, eval_data)
+        return result
diff --git a/recbole/utils/__init__.py b/recbole/utils/__init__.py
index d6f8bac2e..22240e0aa 100644
--- a/recbole/utils/__init__.py
+++ b/recbole/utils/__init__.py
@@ -4,8 +4,9 @@
 from recbole.utils.enum_type import *
 from recbole.utils.argument_list import *
 
-
-__all__ = ['init_logger', 'get_local_time', 'ensure_dir', 'get_model', 'get_trainer', 'early_stopping',
-           'calculate_valid_score', 'dict2str', 'Enum', 'ModelType', 'DataLoaderType', 'KGDataLoaderState',
-           'EvaluatorType', 'InputType', 'FeatureType', 'FeatureSource', 'init_seed',
-           'general_arguments', 'training_arguments', 'evaluation_arguments', 'dataset_arguments']
+__all__ = [
+    'init_logger', 'get_local_time', 'ensure_dir', 'get_model', 'get_trainer', 'early_stopping',
+    'calculate_valid_score', 'dict2str', 'Enum', 'ModelType', 'DataLoaderType', 'KGDataLoaderState', 'EvaluatorType',
+    'InputType', 'FeatureType', 'FeatureSource', 'init_seed', 'general_arguments', 'training_arguments',
+    'evaluation_arguments', 'dataset_arguments'
+]
diff --git a/recbole/utils/argument_list.py b/recbole/utils/argument_list.py
index 383549e84..627f9f884 100644
--- a/recbole/utils/argument_list.py
+++ b/recbole/utils/argument_list.py
@@ -2,38 +2,52 @@
 # @Author : Shanlei Mu
 # @Email  : slmu@ruc.edu.cn
 
+# yapf: disable
 
-general_arguments = ['gpu_id', 'use_gpu',
-                     'seed',
-                     'reproducibility',
-                     'state',
-                     'data_path',
-                     'show_progress']
+general_arguments = [
+    'gpu_id', 'use_gpu',
+    'seed',
+    'reproducibility',
+    'state',
+    'data_path',
+    'show_progress',
+]
 
-training_arguments = ['epochs', 'train_batch_size',
-                      'learner', 'learning_rate',
-                      'training_neg_sample_num',
-                      'training_neg_sample_distribution',
-                      'eval_step', 'stopping_step',
-                      'checkpoint_dir']
+training_arguments = [
+    'epochs', 'train_batch_size',
+    'learner', 'learning_rate',
+    'training_neg_sample_num',
+    'training_neg_sample_distribution',
+    'eval_step', 'stopping_step',
+    'checkpoint_dir',
+    'clip_grad_norm',
+    'loss_decimal_place',
+    'weight_decay',
+    'draw_pic'
+]
 
-evaluation_arguments = ['eval_setting',
-                        'group_by_user',
-                        'split_ratio', 'leave_one_num',
-                        'real_time_process',
-                        'metrics', 'topk', 'valid_metric',
-                        'eval_batch_size']
+evaluation_arguments = [
+    'eval_setting',
+    'group_by_user',
+    'split_ratio', 'leave_one_num',
+    'real_time_process',
+    'metrics', 'topk', 'valid_metric',
+    'eval_batch_size',
+    'metric_decimal_place'
+]
 
-dataset_arguments = ['field_separator', 'seq_separator',
-                     'USER_ID_FIELD', 'ITEM_ID_FIELD', 'RATING_FIELD', 'TIME_FIELD',
-                     'seq_len',
-                     'LABEL_FIELD', 'threshold',
-                     'NEG_PREFIX',
-                     'ITEM_LIST_LENGTH_FIELD', 'LIST_SUFFIX', 'MAX_ITEM_LIST_LENGTH', 'POSITION_FIELD',
-                     'HEAD_ENTITY_ID_FIELD', 'TAIL_ENTITY_ID_FIELD', 'RELATION_ID_FIELD', 'ENTITY_ID_FIELD',
-                     'load_col', 'unload_col', 'unused_col', 'additional_feat_suffix',
-                     'max_user_inter_num', 'min_user_inter_num', 'max_item_inter_num', 'min_item_inter_num',
-                     'lowest_val', 'highest_val', 'equal_val', 'not_equal_val',
-                     'fields_in_same_space',
-                     'preload_weight',
-                     'normalize_field', 'normalize_all']
+dataset_arguments = [
+    'field_separator', 'seq_separator',
+    'USER_ID_FIELD', 'ITEM_ID_FIELD', 'RATING_FIELD', 'TIME_FIELD',
+    'seq_len',
+    'LABEL_FIELD', 'threshold',
+    'NEG_PREFIX',
+    'ITEM_LIST_LENGTH_FIELD', 'LIST_SUFFIX', 'MAX_ITEM_LIST_LENGTH', 'POSITION_FIELD',
+    'HEAD_ENTITY_ID_FIELD', 'TAIL_ENTITY_ID_FIELD', 'RELATION_ID_FIELD', 'ENTITY_ID_FIELD',
+    'load_col', 'unload_col', 'unused_col', 'additional_feat_suffix',
+    'max_user_inter_num', 'min_user_inter_num', 'max_item_inter_num', 'min_item_inter_num',
+    'lowest_val', 'highest_val', 'equal_val', 'not_equal_val',
+    'fields_in_same_space',
+    'preload_weight',
+    'normalize_field', 'normalize_all'
+]
diff --git a/recbole/utils/case_study.py b/recbole/utils/case_study.py
index 45d39f6a8..d8a3b38de 100644
--- a/recbole/utils/case_study.py
+++ b/recbole/utils/case_study.py
@@ -7,6 +7,10 @@
 # @Author : Yushuo Chen
 # @email  : chenyushuo@ruc.edu.cn
 
+"""
+recbole.utils.case_study
+#####################################
+"""
 
 import numpy as np
 import torch
@@ -16,7 +20,7 @@
 
 
 @torch.no_grad()
-def get_scores(uid_series, model, test_data):
+def full_sort_scores(uid_series, model, test_data):
     """Calculate the scores of all items for each user in uid_series.
 
     Note:
@@ -37,14 +41,15 @@ def get_scores(uid_series, model, test_data):
     if isinstance(test_data, GeneralFullDataLoader):
         index = np.isin(test_data.user_df[uid_field].numpy(), uid_series)
         input_interaction = test_data.user_df[index]
-        history_item = test_data.uid2history_item[input_interaction[uid_field]]
+        history_item = test_data.uid2history_item[input_interaction[uid_field].numpy()]
         history_row = torch.cat([torch.full_like(hist_iid, i) for i, hist_iid in enumerate(history_item)])
         history_col = torch.cat(list(history_item))
         history_index = history_row, history_col
     elif isinstance(test_data, SequentialFullDataLoader):
         index = np.isin(test_data.uid_list, uid_series)
-        input_interaction = test_data.augmentation(test_data.item_list_index[index],
-                                                   test_data.target_index[index], test_data.item_list_length[index])
+        input_interaction = test_data.augmentation(
+            test_data.item_list_index[index], test_data.target_index[index], test_data.item_list_length[index]
+        )
         history_index = None
     else:
         raise NotImplementedError
@@ -65,7 +70,7 @@ def get_scores(uid_series, model, test_data):
     return scores
 
 
-def get_topk(uid_series, model, test_data, k):
+def full_sort_topk(uid_series, model, test_data, k):
     """Calculate the top-k items' scores and ids for each user in uid_series.
 
     Args:
@@ -79,5 +84,5 @@ def get_topk(uid_series, model, test_data, k):
             - topk_scores (torch.Tensor): The scores of topk items.
             - topk_index (torch.Tensor): The index of topk items, which is also the internal ids of items.
     """
-    scores = get_scores(uid_series, model, test_data)
+    scores = full_sort_scores(uid_series, model, test_data)
     return torch.topk(scores, k)
diff --git a/recbole/utils/enum_type.py b/recbole/utils/enum_type.py
index 84e15b812..ebc1faea7 100644
--- a/recbole/utils/enum_type.py
+++ b/recbole/utils/enum_type.py
@@ -26,7 +26,7 @@ class ModelType(Enum):
     KNOWLEDGE = 4
     SOCIAL = 5
     TRADITIONAL = 6
-    XGBOOST = 7
+    DECISIONTREE = 7
 
 
 class DataLoaderType(Enum):
diff --git a/recbole/utils/logger.py b/recbole/utils/logger.py
index e9808fc68..76559710f 100644
--- a/recbole/utils/logger.py
+++ b/recbole/utils/logger.py
@@ -11,7 +11,7 @@
 import logging
 import os
 
-from recbole.utils.utils import get_local_time
+from recbole.utils.utils import get_local_time, ensure_dir
 
 
 def init_logger(config):
@@ -30,8 +30,7 @@ def init_logger(config):
     """
     LOGROOT = './log/'
     dir_name = os.path.dirname(LOGROOT)
-    if not os.path.exists(dir_name):
-        os.makedirs(dir_name)
+    ensure_dir(dir_name)
 
     logfilename = '{}-{}.log'.format(config['model'], get_local_time())
 
@@ -64,7 +63,4 @@ def init_logger(config):
     sh.setLevel(level)
     sh.setFormatter(sformatter)
 
-    logging.basicConfig(
-        level=level,
-        handlers=[fh, sh]
-    )
+    logging.basicConfig(level=level, handlers=[fh, sh])
diff --git a/recbole/utils/utils.py b/recbole/utils/utils.py
index faa98c938..9e2012372 100644
--- a/recbole/utils/utils.py
+++ b/recbole/utils/utils.py
@@ -52,10 +52,7 @@ def get_model(model_name):
         Recommender: model class
     """
     model_submodule = [
-        'general_recommender',
-        'context_aware_recommender',
-        'sequential_recommender',
-        'knowledge_aware_recommender',
+        'general_recommender', 'context_aware_recommender', 'sequential_recommender', 'knowledge_aware_recommender',
         'exlib_recommender'
     ]
 
diff --git a/requirements.txt b/requirements.txt
index 7f4bff1bf..46f740e11 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,9 +1,9 @@
 matplotlib>=3.1.3
-torch>=1.6.0
+torch>=1.7.0
 numpy>=1.17.2
-scipy>=1.3.1
+scipy==1.6.0
 hyperopt>=0.2.4
 pandas>=1.0.5
 tqdm>=4.48.2
 scikit_learn>=0.23.2
-pyyaml>=5.1.0
\ No newline at end of file
+pyyaml>=5.1.0
diff --git a/run_test.sh b/run_test.sh
index 190b78540..6d70c5171 100644
--- a/run_test.sh
+++ b/run_test.sh
@@ -2,15 +2,21 @@
 
 
 python -m pytest -v tests/metrics
-printf "metrics tests finished\n"
+echo "metrics tests finished"
+
 python -m pytest -v tests/config/test_config.py
 python -m pytest -v tests/config/test_overall.py
 export PYTHONPATH=.
 python tests/config/test_command_line.py --use_gpu=False --valid_metric=Recall@10 --split_ratio=[0.7,0.2,0.1] --metrics=['Recall@10'] --epochs=200 --eval_setting='LO_RS' --learning_rate=0.3
-printf "config tests finished\n"
+echo "config tests finished"
+
 python -m pytest -v tests/evaluation_setting
-printf "evaluation_setting tests finished\n"
+echo "evaluation_setting tests finished"
+
 python -m pytest -v tests/model/test_model_auto.py
 python -m pytest -v tests/model/test_model_manual.py
-printf "model tests finished\n"
+echo "model tests finished"
 
+python -m pytest -v tests/data/test_dataset.py
+python -m pytest -v tests/data/test_dataloader.py
+echo "data tests finished"
\ No newline at end of file
diff --git a/run_test_example.py b/run_test_example.py
index 82aacfd92..2cd8cc51f 100644
--- a/run_test_example.py
+++ b/run_test_example.py
@@ -134,6 +134,10 @@
         'model': 'LINE',
         'dataset': 'ml-100k',
     },
+    'Test EASE': {
+        'model': 'EASE',
+        'dataset': 'ml-100k',
+    },
     'Test MultiDAE': {
         'model': 'MultiDAE',
         'dataset': 'ml-100k',
@@ -146,6 +150,10 @@
         'model': 'MacridVAE',
         'dataset': 'ml-100k',
     },
+    'Test NNCF': {
+        'model': 'NNCF',
+        'dataset': 'ml-100k',
+    },
 
     # Context-aware Recommendation
     'Test FM': {
diff --git a/setup.py b/setup.py
index 46c74cab9..4cd4676cf 100644
--- a/setup.py
+++ b/setup.py
@@ -6,7 +6,7 @@
 
 from setuptools import setup, find_packages
 
-install_requires = ['numpy>=1.17.2', 'torch>=1.6.0', 'scipy>=1.3.1', 'pandas>=1.0.5', 'tqdm>=4.48.2',
+install_requires = ['numpy>=1.17.2', 'torch>=1.7.0', 'scipy>=1.3.1', 'pandas>=1.0.5', 'tqdm>=4.48.2',
                     'scikit_learn>=0.23.2', 'pyyaml>=5.1.0', 'matplotlib>=3.1.3']
 
 setup_requires = []
@@ -20,7 +20,7 @@
 long_description = 'RecBole is developed based on Python and PyTorch for ' \
                    'reproducing and developing recommendation algorithms in ' \
                    'a unified, comprehensive and efficient framework for ' \
-                   'research purpose. In the first version, Our library ' \
+                   'research purpose. In the first version, our library ' \
                    'includes 53 recommendation algorithms, covering four ' \
                    'major categories: General Recommendation, Sequential ' \
                    'Recommendation, Context-aware Recommendation and ' \
@@ -36,7 +36,7 @@
 setup(
     name='recbole',
     version=
-    '0.1.2',  # please remember to edit recbole/__init__.py in response, once updating the version
+    '0.2.0',  # please remember to edit recbole/__init__.py in response, once updating the version
     description='A unified, comprehensive and efficient recommendation library',
     long_description=long_description,
     long_description_content_type="text/markdown",
diff --git a/style.cfg b/style.cfg
new file mode 100644
index 000000000..606c7ccbd
--- /dev/null
+++ b/style.cfg
@@ -0,0 +1,394 @@
+[style]
+# Align closing bracket with visual indentation.
+align_closing_bracket_with_visual_indent=True
+
+# Allow dictionary keys to exist on multiple lines. For example:
+#
+#   x = {
+#       ('this is the first element of a tuple',
+#        'this is the second element of a tuple'):
+#            value,
+#   }
+allow_multiline_dictionary_keys=False
+
+# Allow lambdas to be formatted on more than one line.
+allow_multiline_lambdas=False
+
+# Allow splitting before a default / named assignment in an argument list.
+allow_split_before_default_or_named_assigns=True
+
+# Allow splits before the dictionary value.
+allow_split_before_dict_value=True
+
+#   Let spacing indicate operator precedence. For example:
+#
+#     a = 1 * 2 + 3 / 4
+#     b = 1 / 2 - 3 * 4
+#     c = (1 + 2) * (3 - 4)
+#     d = (1 - 2) / (3 + 4)
+#     e = 1 * 2 - 3
+#     f = 1 + 2 + 3 + 4
+#
+# will be formatted as follows to indicate precedence:
+#
+#     a = 1*2 + 3/4
+#     b = 1/2 - 3*4
+#     c = (1+2) * (3-4)
+#     d = (1-2) / (3+4)
+#     e = 1*2 - 3
+#     f = 1 + 2 + 3 + 4
+#
+arithmetic_precedence_indication=False
+
+# Number of blank lines surrounding top-level function and class
+# definitions.
+blank_lines_around_top_level_definition=2
+
+# Insert a blank line before a class-level docstring.
+blank_line_before_class_docstring=False
+
+# Insert a blank line before a module docstring.
+blank_line_before_module_docstring=True
+
+# Insert a blank line before a 'def' or 'class' immediately nested
+# within another 'def' or 'class'. For example:
+#
+#   class Foo:
+#                      # <------ this blank line
+#     def method():
+#       ...
+blank_line_before_nested_class_or_def=True
+
+# Do not split consecutive brackets. Only relevant when
+# dedent_closing_brackets is set. For example:
+#
+#    call_func_that_takes_a_dict(
+#        {
+#            'key1': 'value1',
+#            'key2': 'value2',
+#        }
+#    )
+#
+# would reformat to:
+#
+#    call_func_that_takes_a_dict({
+#        'key1': 'value1',
+#        'key2': 'value2',
+#    })
+coalesce_brackets=True
+
+# The column limit.
+column_limit=120
+
+# The style for continuation alignment. Possible values are:
+#
+# - SPACE: Use spaces for continuation alignment. This is default behavior.
+# - FIXED: Use fixed number (CONTINUATION_INDENT_WIDTH) of columns
+#   (ie: CONTINUATION_INDENT_WIDTH/INDENT_WIDTH tabs or
+#   CONTINUATION_INDENT_WIDTH spaces) for continuation alignment.
+# - VALIGN-RIGHT: Vertically align continuation lines to multiple of
+#   INDENT_WIDTH columns. Slightly right (one tab or a few spaces) if
+#   cannot vertically align continuation lines with indent characters.
+continuation_align_style=SPACE
+
+# Indent width used for line continuations.
+continuation_indent_width=4
+
+# Put closing brackets on a separate line, dedented, if the bracketed
+# expression can't fit in a single line. Applies to all kinds of brackets,
+# including function definitions and calls. For example:
+#
+#   config = {
+#       'key1': 'value1',
+#       'key2': 'value2',
+#   }        # <--- this bracket is dedented and on a separate line
+#
+#   time_series = self.remote_client.query_entity_counters(
+#       entity='dev3246.region1',
+#       key='dns.query_latency_tcp',
+#       transform=Transformation.AVERAGE(window=timedelta(seconds=60)),
+#       start_ts=now()-timedelta(days=3),
+#       end_ts=now(),
+#   )        # <--- this bracket is dedented and on a separate line
+dedent_closing_brackets=True
+
+# Disable the heuristic which places each list element on a separate line
+# if the list is comma-terminated.
+disable_ending_comma_heuristic=False
+
+# Place each dictionary entry onto its own line.
+each_dict_entry_on_separate_line=True
+
+# Require multiline dictionary even if it would normally fit on one line.
+# For example:
+#
+#   config = {
+#       'key1': 'value1'
+#   }
+force_multiline_dict=False
+
+# The regex for an i18n comment. The presence of this comment stops
+# reformatting of that line, because the comments are required to be
+# next to the string they translate.
+i18n_comment=
+
+# The i18n function call names. The presence of this function stops
+# reformattting on that line, because the string it has cannot be moved
+# away from the i18n comment.
+i18n_function_call=
+
+# Indent blank lines.
+indent_blank_lines=False
+
+# Put closing brackets on a separate line, indented, if the bracketed
+# expression can't fit in a single line. Applies to all kinds of brackets,
+# including function definitions and calls. For example:
+#
+#   config = {
+#       'key1': 'value1',
+#       'key2': 'value2',
+#       }        # <--- this bracket is indented and on a separate line
+#
+#   time_series = self.remote_client.query_entity_counters(
+#       entity='dev3246.region1',
+#       key='dns.query_latency_tcp',
+#       transform=Transformation.AVERAGE(window=timedelta(seconds=60)),
+#       start_ts=now()-timedelta(days=3),
+#       end_ts=now(),
+#       )        # <--- this bracket is indented and on a separate line
+indent_closing_brackets=False
+
+# Indent the dictionary value if it cannot fit on the same line as the
+# dictionary key. For example:
+#
+#   config = {
+#       'key1':
+#           'value1',
+#       'key2': value1 +
+#               value2,
+#   }
+indent_dictionary_value=False
+
+# The number of columns to use for indentation.
+indent_width=4
+
+# Join short lines into one line. E.g., single line 'if' statements.
+join_multiple_lines=True
+
+# Do not include spaces around selected binary operators. For example:
+#
+#   1 + 2 * 3 - 4 / 5
+#
+# will be formatted as follows when configured with "*,/":
+#
+#   1 + 2*3 - 4/5
+no_spaces_around_selected_binary_operators=
+
+# Use spaces around default or named assigns.
+spaces_around_default_or_named_assign=False
+
+# Adds a space after the opening '{' and before the ending '}' dict delimiters.
+#
+#   {1: 2}
+#
+# will be formatted as:
+#
+#   { 1: 2 }
+spaces_around_dict_delimiters=False
+
+# Adds a space after the opening '[' and before the ending ']' list delimiters.
+#
+#   [1, 2]
+#
+# will be formatted as:
+#
+#   [ 1, 2 ]
+spaces_around_list_delimiters=False
+
+# Use spaces around the power operator.
+spaces_around_power_operator=True
+
+# Use spaces around the subscript / slice operator.  For example:
+#
+#   my_list[1 : 10 : 2]
+spaces_around_subscript_colon=False
+
+# Adds a space after the opening '(' and before the ending ')' tuple delimiters.
+#
+#   (1, 2, 3)
+#
+# will be formatted as:
+#
+#   ( 1, 2, 3 )
+spaces_around_tuple_delimiters=False
+
+# The number of spaces required before a trailing comment.
+# This can be a single value (representing the number of spaces
+# before each trailing comment) or list of values (representing
+# alignment column values; trailing comments within a block will
+# be aligned to the first column value that is greater than the maximum
+# line length within the block). For example:
+#
+# With spaces_before_comment=5:
+#
+#   1 + 1 # Adding values
+#
+# will be formatted as:
+#
+#   1 + 1     # Adding values <-- 5 spaces between the end of the statement and comment
+#
+# With spaces_before_comment=15, 20:
+#
+#   1 + 1 # Adding values
+#   two + two # More adding
+#
+#   longer_statement # This is a longer statement
+#   short # This is a shorter statement
+#
+#   a_very_long_statement_that_extends_beyond_the_final_column # Comment
+#   short # This is a shorter statement
+#
+# will be formatted as:
+#
+#   1 + 1          # Adding values <-- end of line comments in block aligned to col 15
+#   two + two      # More adding
+#
+#   longer_statement    # This is a longer statement <-- end of line comments in block aligned to col 20
+#   short               # This is a shorter statement
+#
+#   a_very_long_statement_that_extends_beyond_the_final_column  # Comment <-- the end of line comments are aligned based on the line length
+#   short                                                       # This is a shorter statement
+#
+spaces_before_comment=2
+
+# Insert a space between the ending comma and closing bracket of a list,
+# etc.
+space_between_ending_comma_and_closing_bracket=False
+
+# Use spaces inside brackets, braces, and parentheses.  For example:
+#
+#   method_call( 1 )
+#   my_dict[ 3 ][ 1 ][ get_index( *args, **kwargs ) ]
+#   my_set = { 1, 2, 3 }
+space_inside_brackets=False
+
+# Split before arguments
+split_all_comma_separated_values=False
+
+# Split before arguments, but do not split all subexpressions recursively
+# (unless needed).
+split_all_top_level_comma_separated_values=False
+
+# Split before arguments if the argument list is terminated by a
+# comma.
+split_arguments_when_comma_terminated=False
+
+# Set to True to prefer splitting before '+', '-', '*', '/', '//', or '@'
+# rather than after.
+split_before_arithmetic_operator=False
+
+# Set to True to prefer splitting before '&', '|' or '^' rather than
+# after.
+split_before_bitwise_operator=True
+
+# Split before the closing bracket if a list or dict literal doesn't fit on
+# a single line.
+split_before_closing_bracket=True
+
+# Split before a dictionary or set generator (comp_for). For example, note
+# the split before the 'for':
+#
+#   foo = {
+#       variable: 'Hello world, have a nice day!'
+#       for variable in bar if variable != 42
+#   }
+split_before_dict_set_generator=True
+
+# Split before the '.' if we need to split a longer expression:
+#
+#   foo = ('This is a really long string: {}, {}, {}, {}'.format(a, b, c, d))
+#
+# would reformat to something like:
+#
+#   foo = ('This is a really long string: {}, {}, {}, {}'
+#          .format(a, b, c, d))
+split_before_dot=False
+
+# Split after the opening paren which surrounds an expression if it doesn't
+# fit on a single line.
+split_before_expression_after_opening_paren=False
+
+# If an argument / parameter list is going to be split, then split before
+# the first argument.
+split_before_first_argument=False
+
+# Set to True to prefer splitting before 'and' or 'or' rather than
+# after.
+split_before_logical_operator=True
+
+# Split named assignments onto individual lines.
+split_before_named_assigns=True
+
+# Set to True to split list comprehensions and generators that have
+# non-trivial expressions and multiple clauses before each of these
+# clauses. For example:
+#
+#   result = [
+#       a_long_var + 100 for a_long_var in xrange(1000)
+#       if a_long_var % 10]
+#
+# would reformat to something like:
+#
+#   result = [
+#       a_long_var + 100
+#       for a_long_var in xrange(1000)
+#       if a_long_var % 10]
+split_complex_comprehension=False
+
+# The penalty for splitting right after the opening bracket.
+split_penalty_after_opening_bracket=300
+
+# The penalty for splitting the line after a unary operator.
+split_penalty_after_unary_operator=10000
+
+# The penalty of splitting the line around the '+', '-', '*', '/', '//',
+# ``%``, and '@' operators.
+split_penalty_arithmetic_operator=300
+
+# The penalty for splitting right before an if expression.
+split_penalty_before_if_expr=0
+
+# The penalty of splitting the line around the '&', '|', and '^'
+# operators.
+split_penalty_bitwise_operator=300
+
+# The penalty for splitting a list comprehension or generator
+# expression.
+split_penalty_comprehension=80
+
+# The penalty for characters over the column limit.
+split_penalty_excess_character=7000
+
+# The penalty incurred by adding a line split to the unwrapped line. The
+# more line splits added the higher the penalty.
+split_penalty_for_added_line_split=30
+
+# The penalty of splitting a list of "import as" names. For example:
+#
+#   from a_very_long_or_indented_module_name_yada_yad import (long_argument_1,
+#                                                             long_argument_2,
+#                                                             long_argument_3)
+#
+# would reformat to something like:
+#
+#   from a_very_long_or_indented_module_name_yada_yad import (
+#       long_argument_1, long_argument_2, long_argument_3)
+split_penalty_import_names=0
+
+# The penalty of splitting the line around the 'and' and 'or'
+# operators.
+split_penalty_logical_operator=300
+
+# Use the Tab character for indentation.
+use_tabs=False
+
diff --git a/tests/data/build_dataset/build_dataset.inter b/tests/data/build_dataset/build_dataset.inter
new file mode 100644
index 000000000..519234283
--- /dev/null
+++ b/tests/data/build_dataset/build_dataset.inter
@@ -0,0 +1,21 @@
+user_id:token	item_id:token	timestamp:float
+1	1	1
+1	2	2
+1	3	3
+1	4	4
+1	5	5
+1	6	6
+1	7	7
+1	8	8
+1	9	9
+1	10	10
+1	11	11
+1	12	12
+1	13	13
+1	14	14
+1	15	15
+1	16	16
+1	17	17
+1	18	18
+1	19	19
+1	20	20
\ No newline at end of file
diff --git a/tests/data/filter_by_field_value/filter_by_field_value.inter b/tests/data/filter_by_field_value/filter_by_field_value.inter
new file mode 100644
index 000000000..830b16236
--- /dev/null
+++ b/tests/data/filter_by_field_value/filter_by_field_value.inter
@@ -0,0 +1,11 @@
+user_id:token	item_id:token	timestamp:float	rating:float
+1	1	4	2
+1	1	6	0
+1	1	0	0
+1	1	8	3
+1	1	3	3
+1	1	1	0
+1	1	9	3
+1	1	2	1
+1	1	5	2
+1	1	7	4
\ No newline at end of file
diff --git a/tests/data/filter_by_inter_num/filter_by_inter_num.inter b/tests/data/filter_by_inter_num/filter_by_inter_num.inter
new file mode 100644
index 000000000..88c40e08c
--- /dev/null
+++ b/tests/data/filter_by_inter_num/filter_by_inter_num.inter
@@ -0,0 +1,14 @@
+user_id:token	item_id:token
+1	1
+2	1
+2	2
+3	3
+3	4
+3	5
+3	6
+4	3
+4	4
+5	5
+5	6
+6	5
+6	6
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.inter b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.inter
new file mode 100644
index 000000000..ce484721c
--- /dev/null
+++ b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.inter
@@ -0,0 +1,9 @@
+user_id:token	item_id:token
+1	1
+1	2
+2	1
+2	2
+3	3
+3	4
+4	3
+4	4
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.item b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.item
new file mode 100644
index 000000000..c8acfcf74
--- /dev/null
+++ b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.item
@@ -0,0 +1,4 @@
+item_id:token	price:float
+1	0
+3	0
+4	0
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.user b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.user
new file mode 100644
index 000000000..548fb05b5
--- /dev/null
+++ b/tests/data/filter_inter_by_ui_and_inter_num/filter_inter_by_ui_and_inter_num.user
@@ -0,0 +1,4 @@
+user_id:token	age:float
+1	0
+3	0
+4	0
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.inter b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.inter
new file mode 100644
index 000000000..51b0e8ad2
--- /dev/null
+++ b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.inter
@@ -0,0 +1,3 @@
+user_id:token	item_id:token
+1	1
+2	2
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.item b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.item
new file mode 100644
index 000000000..5a0cb21b9
--- /dev/null
+++ b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.item
@@ -0,0 +1,2 @@
+item_id:token	price:float
+1	1
\ No newline at end of file
diff --git a/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.user b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.user
new file mode 100644
index 000000000..0c5348fe1
--- /dev/null
+++ b/tests/data/filter_inter_by_user_or_item/filter_inter_by_user_or_item.user
@@ -0,0 +1,2 @@
+user_id:token	age:float
+1	1
\ No newline at end of file
diff --git a/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.inter b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.inter
new file mode 100644
index 000000000..bc0d57222
--- /dev/null
+++ b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.inter
@@ -0,0 +1,5 @@
+user_id:token	item_id:token	timestamp:float
+1		0
+	1	1
+		2
+1	1	3
\ No newline at end of file
diff --git a/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.item b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.item
new file mode 100644
index 000000000..13197d312
--- /dev/null
+++ b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.item
@@ -0,0 +1,5 @@
+item_id:token	price:float
+	0
+1	1
+	2
+2	3
\ No newline at end of file
diff --git a/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.user b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.user
new file mode 100644
index 000000000..0ea7fa39a
--- /dev/null
+++ b/tests/data/filter_nan_user_or_item/filter_nan_user_or_item.user
@@ -0,0 +1,4 @@
+user_id:token	age:float
+1	0
+	1
+2	0
\ No newline at end of file
diff --git a/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.inter b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.inter
new file mode 100644
index 000000000..eff0339c2
--- /dev/null
+++ b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.inter
@@ -0,0 +1,6 @@
+user_id:token	item_id:token
+1	1
+1	2
+2	2
+2	3
+3	3
\ No newline at end of file
diff --git a/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.item b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.item
new file mode 100644
index 000000000..28bd72865
--- /dev/null
+++ b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.item
@@ -0,0 +1,4 @@
+item_id:token	price:float
+1	3
+2	2
+3	1
\ No newline at end of file
diff --git a/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.user b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.user
new file mode 100644
index 000000000..72a75cda7
--- /dev/null
+++ b/tests/data/filter_value_and_filter_inter_by_ui/filter_value_and_filter_inter_by_ui.user
@@ -0,0 +1,4 @@
+user_id:token	age:float
+1	1
+2	2
+3	3
\ No newline at end of file
diff --git a/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.inter b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.inter
new file mode 100644
index 000000000..387d8f007
--- /dev/null
+++ b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.inter
@@ -0,0 +1,15 @@
+user_id:token	item_id:token	rating:float
+1	1	0
+1	2	0
+2	1	0
+2	3	0
+3	2	0
+3	3	0
+4	4	1
+4	5	0
+5	4	0
+5	5	0
+6	6	0
+6	7	0
+7	6	0
+7	7	0
\ No newline at end of file
diff --git a/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.item b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.item
new file mode 100644
index 000000000..ed2edf288
--- /dev/null
+++ b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.item
@@ -0,0 +1,8 @@
+item_id:token	price:float
+1	0
+2	0
+3	1
+4	0
+5	0
+6	0
+7	0
\ No newline at end of file
diff --git a/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.user b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.user
new file mode 100644
index 000000000..71b1f48b0
--- /dev/null
+++ b/tests/data/filter_value_and_inter_num/filter_value_and_inter_num.user
@@ -0,0 +1,8 @@
+user_id:token	age:float
+1	0
+2	0
+3	1
+4	0
+5	0
+6	0
+7	0
\ No newline at end of file
diff --git a/tests/data/general_dataloader/general_dataloader.inter b/tests/data/general_dataloader/general_dataloader.inter
new file mode 100644
index 000000000..0b7e82c22
--- /dev/null
+++ b/tests/data/general_dataloader/general_dataloader.inter
@@ -0,0 +1,51 @@
+user_id:token	item_id:token	timestamp:float
+1	1	1
+1	2	2
+1	3	3
+1	4	4
+1	5	5
+1	6	6
+1	7	7
+1	8	8
+1	9	9
+1	10	10
+1	11	11
+1	12	12
+1	13	13
+1	14	14
+1	15	15
+1	16	16
+1	17	17
+1	18	18
+1	19	19
+1	20	20
+1	21	21
+1	22	22
+1	23	23
+1	24	24
+1	25	25
+1	26	26
+1	27	27
+1	28	28
+1	29	29
+1	30	30
+1	31	31
+1	32	32
+1	33	33
+1	34	34
+1	35	35
+1	36	36
+1	37	37
+1	38	38
+1	39	39
+1	40	40
+1	41	41
+1	42	42
+1	43	43
+1	44	44
+1	45	45
+1	46	46
+1	47	47
+1	48	48
+1	49	49
+1	50	50
\ No newline at end of file
diff --git a/tests/data/general_dataloader/general_dataloader.item b/tests/data/general_dataloader/general_dataloader.item
new file mode 100644
index 000000000..e3239d7df
--- /dev/null
+++ b/tests/data/general_dataloader/general_dataloader.item
@@ -0,0 +1,101 @@
+item_id:token	price:float
+1	1
+2	2
+3	3
+4	4
+5	5
+6	6
+7	7
+8	8
+9	9
+10	10
+11	11
+12	12
+13	13
+14	14
+15	15
+16	16
+17	17
+18	18
+19	19
+20	20
+21	21
+22	22
+23	23
+24	24
+25	25
+26	26
+27	27
+28	28
+29	29
+30	30
+31	31
+32	32
+33	33
+34	34
+35	35
+36	36
+37	37
+38	38
+39	39
+40	40
+41	41
+42	42
+43	43
+44	44
+45	45
+46	46
+47	47
+48	48
+49	49
+50	50
+51	51
+52	52
+53	53
+54	54
+55	55
+56	56
+57	57
+58	58
+59	59
+60	60
+61	61
+62	62
+63	63
+64	64
+65	65
+66	66
+67	67
+68	68
+69	69
+70	70
+71	71
+72	72
+73	73
+74	74
+75	75
+76	76
+77	77
+78	78
+79	79
+80	80
+81	81
+82	82
+83	83
+84	84
+85	85
+86	86
+87	87
+88	88
+89	89
+90	90
+91	91
+92	92
+93	93
+94	94
+95	95
+96	96
+97	97
+98	98
+99	99
+100	100
\ No newline at end of file
diff --git a/tests/data/general_full_dataloader/general_full_dataloader.inter b/tests/data/general_full_dataloader/general_full_dataloader.inter
new file mode 100644
index 000000000..0d59a5e00
--- /dev/null
+++ b/tests/data/general_full_dataloader/general_full_dataloader.inter
@@ -0,0 +1,111 @@
+user_id:token	item_id:token	timestamp:float
+1	1	1
+1	2	2
+1	3	3
+1	4	4
+1	5	5
+1	6	6
+1	7	7
+1	8	8
+1	9	9
+1	10	10
+1	11	11
+1	12	12
+1	13	13
+1	14	14
+1	15	15
+1	16	16
+1	17	17
+1	18	18
+1	19	19
+1	20	20
+1	21	21
+1	22	22
+1	23	23
+1	24	24
+1	25	25
+1	26	26
+1	27	27
+1	28	28
+1	29	29
+1	30	30
+1	31	31
+1	32	32
+1	33	33
+1	34	34
+1	35	35
+1	36	36
+1	37	37
+1	38	38
+1	39	39
+1	40	40
+1	41	41
+1	42	42
+1	43	43
+1	44	44
+1	45	45
+1	46	46
+1	47	47
+1	48	48
+1	49	49
+1	50	50
+2	1	1
+2	2	2
+2	3	3
+2	4	4
+2	5	5
+2	6	6
+2	7	7
+2	8	8
+2	9	9
+2	10	10
+2	11	11
+2	12	12
+2	13	13
+2	14	14
+2	15	15
+2	16	16
+2	17	17
+2	18	18
+2	19	19
+2	20	20
+2	21	21
+2	22	22
+2	23	23
+2	24	24
+2	25	25
+2	26	26
+2	27	27
+2	28	28
+2	29	29
+2	30	30
+2	31	31
+2	32	32
+2	33	33
+2	34	34
+2	35	35
+2	36	36
+2	37	37
+2	38	38
+2	39	39
+2	40	40
+2	38	41
+2	39	42
+2	40	43
+2	41	44
+2	42	45
+2	36	46
+2	37	47
+2	38	48
+2	39	49
+2	40	50
+3	1	1
+3	1	2
+3	1	3
+3	1	4
+3	1	5
+3	1	6
+3	1	7
+3	1	8
+3	1	9
+3	1	10
\ No newline at end of file
diff --git a/tests/data/general_full_dataloader/general_full_dataloader.item b/tests/data/general_full_dataloader/general_full_dataloader.item
new file mode 100644
index 000000000..e3239d7df
--- /dev/null
+++ b/tests/data/general_full_dataloader/general_full_dataloader.item
@@ -0,0 +1,101 @@
+item_id:token	price:float
+1	1
+2	2
+3	3
+4	4
+5	5
+6	6
+7	7
+8	8
+9	9
+10	10
+11	11
+12	12
+13	13
+14	14
+15	15
+16	16
+17	17
+18	18
+19	19
+20	20
+21	21
+22	22
+23	23
+24	24
+25	25
+26	26
+27	27
+28	28
+29	29
+30	30
+31	31
+32	32
+33	33
+34	34
+35	35
+36	36
+37	37
+38	38
+39	39
+40	40
+41	41
+42	42
+43	43
+44	44
+45	45
+46	46
+47	47
+48	48
+49	49
+50	50
+51	51
+52	52
+53	53
+54	54
+55	55
+56	56
+57	57
+58	58
+59	59
+60	60
+61	61
+62	62
+63	63
+64	64
+65	65
+66	66
+67	67
+68	68
+69	69
+70	70
+71	71
+72	72
+73	73
+74	74
+75	75
+76	76
+77	77
+78	78
+79	79
+80	80
+81	81
+82	82
+83	83
+84	84
+85	85
+86	86
+87	87
+88	88
+89	89
+90	90
+91	91
+92	92
+93	93
+94	94
+95	95
+96	96
+97	97
+98	98
+99	99
+100	100
\ No newline at end of file
diff --git a/tests/data/general_uni100_dataloader/general_uni100_dataloader.inter b/tests/data/general_uni100_dataloader/general_uni100_dataloader.inter
new file mode 100644
index 000000000..038b59c3d
--- /dev/null
+++ b/tests/data/general_uni100_dataloader/general_uni100_dataloader.inter
@@ -0,0 +1,41 @@
+user_id:token	item_id:token	timestamp:float
+1	1	1
+1	2	2
+1	3	3
+1	4	4
+1	5	5
+1	6	6
+1	7	7
+1	8	8
+1	9	9
+1	10	10
+2	1	1
+2	1	2
+2	1	3
+2	1	4
+2	1	5
+2	1	6
+2	1	7
+2	1	8
+2	1	9
+2	1	10
+3	1	1
+3	2	2
+3	3	3
+3	4	4
+3	5	5
+3	6	6
+3	7	7
+3	8	8
+3	9	9
+3	10	10
+3	11	11
+3	12	12
+3	13	13
+3	14	14
+3	15	15
+3	16	16
+3	17	17
+3	18	18
+3	19	19
+3	20	20
\ No newline at end of file
diff --git a/tests/data/general_uni100_dataloader/general_uni100_dataloader.item b/tests/data/general_uni100_dataloader/general_uni100_dataloader.item
new file mode 100644
index 000000000..e3239d7df
--- /dev/null
+++ b/tests/data/general_uni100_dataloader/general_uni100_dataloader.item
@@ -0,0 +1,101 @@
+item_id:token	price:float
+1	1
+2	2
+3	3
+4	4
+5	5
+6	6
+7	7
+8	8
+9	9
+10	10
+11	11
+12	12
+13	13
+14	14
+15	15
+16	16
+17	17
+18	18
+19	19
+20	20
+21	21
+22	22
+23	23
+24	24
+25	25
+26	26
+27	27
+28	28
+29	29
+30	30
+31	31
+32	32
+33	33
+34	34
+35	35
+36	36
+37	37
+38	38
+39	39
+40	40
+41	41
+42	42
+43	43
+44	44
+45	45
+46	46
+47	47
+48	48
+49	49
+50	50
+51	51
+52	52
+53	53
+54	54
+55	55
+56	56
+57	57
+58	58
+59	59
+60	60
+61	61
+62	62
+63	63
+64	64
+65	65
+66	66
+67	67
+68	68
+69	69
+70	70
+71	71
+72	72
+73	73
+74	74
+75	75
+76	76
+77	77
+78	78
+79	79
+80	80
+81	81
+82	82
+83	83
+84	84
+85	85
+86	86
+87	87
+88	88
+89	89
+90	90
+91	91
+92	92
+93	93
+94	94
+95	95
+96	96
+97	97
+98	98
+99	99
+100	100
\ No newline at end of file
diff --git a/tests/data/normalize/normalize.inter b/tests/data/normalize/normalize.inter
new file mode 100644
index 000000000..a3ede2f78
--- /dev/null
+++ b/tests/data/normalize/normalize.inter
@@ -0,0 +1,6 @@
+user_id:token	item_id:token	rating:float	star:float
+1	1	0	4
+2	2	1	2
+3	3	4	0
+4	4	3	1
+5	5	2	3
\ No newline at end of file
diff --git a/tests/data/remap_id/remap_id.inter b/tests/data/remap_id/remap_id.inter
new file mode 100644
index 000000000..7ce9e9b8b
--- /dev/null
+++ b/tests/data/remap_id/remap_id.inter
@@ -0,0 +1,5 @@
+user_id:token	item_id:token	add_user:token	add_item:token	user_list:token_seq
+ua	ia	ub	ie	uc ue
+ub	ib	ue	ic	
+uc	ic	ud	if	ua ub uc
+ud	id	uf	ia	uf
\ No newline at end of file
diff --git a/tests/data/remove_duplication/remove_duplication.inter b/tests/data/remove_duplication/remove_duplication.inter
new file mode 100644
index 000000000..c7356667d
--- /dev/null
+++ b/tests/data/remove_duplication/remove_duplication.inter
@@ -0,0 +1,4 @@
+user_id:token	item_id:token	timestamp:float
+1	1	1
+1	1	0
+1	1	2
\ No newline at end of file
diff --git a/tests/data/rm_dup_and_filter_by_inter_num/rm_dup_and_filter_by_inter_num.inter b/tests/data/rm_dup_and_filter_by_inter_num/rm_dup_and_filter_by_inter_num.inter
new file mode 100644
index 000000000..0415e73eb
--- /dev/null
+++ b/tests/data/rm_dup_and_filter_by_inter_num/rm_dup_and_filter_by_inter_num.inter
@@ -0,0 +1,9 @@
+user_id:token	item_id:token
+1	1
+1	2
+2	1
+2	2
+3	3
+3	3
+3	4
+4	4
\ No newline at end of file
diff --git a/tests/data/rm_dup_and_filter_value/rm_dup_and_filter_value.inter b/tests/data/rm_dup_and_filter_value/rm_dup_and_filter_value.inter
new file mode 100644
index 000000000..ce83c964f
--- /dev/null
+++ b/tests/data/rm_dup_and_filter_value/rm_dup_and_filter_value.inter
@@ -0,0 +1,5 @@
+user_id:token	item_id:token	timestamp:float	rating:float
+1	1	1	1
+1	1	0	5
+1	1	2	3
+2	2	0	3
\ No newline at end of file
diff --git a/tests/data/set_label_by_threshold/set_label_by_threshold.inter b/tests/data/set_label_by_threshold/set_label_by_threshold.inter
new file mode 100644
index 000000000..e1e20b2a5
--- /dev/null
+++ b/tests/data/set_label_by_threshold/set_label_by_threshold.inter
@@ -0,0 +1,5 @@
+user_id:token	item_id:token	rating:float
+1	1	5
+2	2	3
+3	3	4
+4	4	2
\ No newline at end of file
diff --git a/tests/data/test_dataloader.py b/tests/data/test_dataloader.py
new file mode 100644
index 000000000..e79e86030
--- /dev/null
+++ b/tests/data/test_dataloader.py
@@ -0,0 +1,373 @@
+# -*- coding: utf-8 -*-
+# @Time   : 2021/1/5
+# @Author : Yushuo Chen
+# @Email  : chenyushuo@ruc.edu.cn
+
+# UPDATE
+# @Time    :   2020/1/5
+# @Author  :   Yushuo Chen
+# @email   :   chenyushuo@ruc.edu.cn
+
+import logging
+import os
+
+import pytest
+
+from recbole.config import Config
+from recbole.data import create_dataset, data_preparation
+from recbole.utils import init_seed
+
+current_path = os.path.dirname(os.path.realpath(__file__))
+
+
+def new_dataloader(config_dict=None, config_file_list=None):
+    config = Config(config_dict=config_dict, config_file_list=config_file_list)
+    init_seed(config['seed'], config['reproducibility'])
+    logging.basicConfig(level=logging.ERROR)
+    dataset = create_dataset(config)
+    return data_preparation(config, dataset)
+
+
+class TestGeneralDataloader:
+    def test_general_dataloader(self):
+        train_batch_size = 6
+        eval_batch_size = 2
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'general_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS',
+            'training_neg_sample_num': 0,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        def check_dataloader(data, item_list, batch_size):
+            data.shuffle = False
+            pr = 0
+            for batch_data in data:
+                batch_item_list = item_list[pr: pr + batch_size]
+                assert (batch_data['item_id'].numpy() == batch_item_list).all()
+                pr += batch_size
+
+        check_dataloader(train_data, list(range(1, 41)), train_batch_size)
+        check_dataloader(valid_data, list(range(41, 46)), eval_batch_size)
+        check_dataloader(test_data, list(range(46, 51)), eval_batch_size)
+
+    def test_general_neg_sample_dataloader_in_pair_wise(self):
+        train_batch_size = 6
+        eval_batch_size = 100
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'general_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS,full',
+            'training_neg_sample_num': 1,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        train_data.shuffle = False
+        train_item_list = list(range(1, 41))
+        pr = 0
+        for batch_data in train_data:
+            batch_item_list = train_item_list[pr: pr + train_batch_size]
+            assert (batch_data['item_id'].numpy() == batch_item_list).all()
+            assert (batch_data['item_id'] == batch_data['price']).all()
+            assert (40 < batch_data['neg_item_id']).all()
+            assert (batch_data['neg_item_id'] <= 100).all()
+            assert (batch_data['neg_item_id'] == batch_data['neg_price']).all()
+            pr += train_batch_size
+
+    def test_general_neg_sample_dataloader_in_point_wise(self):
+        train_batch_size = 6
+        eval_batch_size = 100
+        config_dict = {
+            'model': 'DMF',
+            'dataset': 'general_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS,full',
+            'training_neg_sample_num': 1,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        train_data.shuffle = False
+        train_item_list = list(range(1, 41))
+        pr = 0
+        for batch_data in train_data:
+            step = len(batch_data) // 2
+            batch_item_list = train_item_list[pr: pr + step]
+            assert (batch_data['item_id'][: step].numpy() == batch_item_list).all()
+            assert (40 < batch_data['item_id'][step:]).all()
+            assert (batch_data['item_id'][step:] <= 100).all()
+            assert (batch_data['item_id'] == batch_data['price']).all()
+            pr += step
+
+    def test_general_full_dataloader(self):
+        train_batch_size = 6
+        eval_batch_size = 100
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'general_full_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS,full',
+            'training_neg_sample_num': 1,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        def check_result(data, result):
+            assert len(data) == len(result)
+            for i, batch_data in enumerate(data):
+                user_df, history_index, swap_row, swap_col_after, swap_col_before = batch_data
+                history_row, history_col = history_index
+                assert len(user_df) == result[i]['len_user_df']
+                assert (user_df['user_id'].numpy() == result[i]['user_df_user_id']).all()
+                assert (user_df.pos_len_list == result[i]['pos_len_list']).all()
+                assert (user_df.user_len_list == result[i]['user_len_list']).all()
+                assert len(history_row) == len(history_col) == result[i]['history_len']
+                assert (history_row.numpy() == result[i]['history_row']).all()
+                assert (history_col.numpy() == result[i]['history_col']).all()
+                assert len(swap_row) == len(swap_col_after) == len(swap_col_before) == result[i]['swap_len']
+                assert (swap_row.numpy() == result[i]['swap_row']).all()
+                assert (swap_col_after.numpy() == result[i]['swap_col_after']).all()
+                assert (swap_col_before.numpy() == result[i]['swap_col_before']).all()
+
+        valid_result = [
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [1],
+                'pos_len_list': [5],
+                'user_len_list': [101],
+                'history_len': 40,
+                'history_row': 0,
+                'history_col': list(range(1, 41)),
+                'swap_len': 10,
+                'swap_row': 0,
+                'swap_col_after': [0, 1, 2, 3, 4, 41, 42, 43, 44, 45],
+                'swap_col_before': [45, 44, 43, 42, 41, 4, 3, 2, 1, 0],
+            },
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [2],
+                'pos_len_list': [5],
+                'user_len_list': [101],
+                'history_len': 37,
+                'history_row': 0,
+                'history_col': list(range(1, 38)),
+                'swap_len': 10,
+                'swap_row': 0,
+                'swap_col_after': [0, 1, 2, 3, 4, 38, 39, 40, 41, 42],
+                'swap_col_before': [42, 41, 40, 39, 38, 4, 3, 2, 1, 0],
+            },
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [3],
+                'pos_len_list': [1],
+                'user_len_list': [101],
+                'history_len': 0,
+                'history_row': [],
+                'history_col': [],
+                'swap_len': 2,
+                'swap_row': 0,
+                'swap_col_after': [0, 1],
+                'swap_col_before': [1, 0],
+            },
+        ]
+        check_result(valid_data, valid_result)
+
+        test_result = [
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [1],
+                'pos_len_list': [5],
+                'user_len_list': [101],
+                'history_len': 45,
+                'history_row': 0,
+                'history_col': list(range(1, 46)),
+                'swap_len': 10,
+                'swap_row': 0,
+                'swap_col_after': [0, 1, 2, 3, 4, 46, 47, 48, 49, 50],
+                'swap_col_before': [50, 49, 48, 47, 46, 4, 3, 2, 1, 0],
+            },
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [2],
+                'pos_len_list': [5],
+                'user_len_list': [101],
+                'history_len': 37,
+                'history_row': 0,
+                'history_col': list(range(1, 36)) + [41, 42],
+                'swap_len': 10,
+                'swap_row': 0,
+                'swap_col_after': [0, 1, 2, 3, 4, 36, 37, 38, 39, 40],
+                'swap_col_before': [40, 39, 38, 37, 36, 4, 3, 2, 1, 0],
+            },
+            {
+                'len_user_df': 1,
+                'user_df_user_id': [3],
+                'pos_len_list': [1],
+                'user_len_list': [101],
+                'history_len': 0,
+                'history_row': [],
+                'history_col': [],
+                'swap_len': 2,
+                'swap_row': 0,
+                'swap_col_after': [0, 1],
+                'swap_col_before': [1, 0],
+            },
+        ]
+        check_result(test_data, test_result)
+
+    def test_general_uni100_dataloader_with_batch_size_in_101(self):
+        train_batch_size = 6
+        eval_batch_size = 101
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'general_uni100_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS,uni100',
+            'training_neg_sample_num': 1,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        def check_result(data, result):
+            assert data.batch_size == 202
+            assert len(data) == len(result)
+            for i, batch_data in enumerate(data):
+                assert result[i]['item_id_check'](batch_data['item_id'])
+                assert batch_data.pos_len_list == result[i]['pos_len_list']
+                assert batch_data.user_len_list == result[i]['user_len_list']
+
+        valid_result = [
+            {
+                'item_id_check': lambda data: data[0] == 9
+                                              and (8 < data[1:]).all()
+                                              and (data[1:] <= 100).all(),
+                'pos_len_list': [1],
+                'user_len_list': [101],
+            },
+            {
+                'item_id_check': lambda data: data[0] == 1
+                                              and (data[1:] != 1).all(),
+                'pos_len_list': [1],
+                'user_len_list': [101],
+            },
+            {
+                'item_id_check': lambda data: (data[0: 2].numpy() == [17, 18]).all()
+                                              and (16 < data[2:]).all()
+                                              and (data[2:] <= 100).all(),
+                'pos_len_list': [2],
+                'user_len_list': [202],
+            },
+        ]
+        check_result(valid_data, valid_result)
+
+        test_result = [
+            {
+                'item_id_check': lambda data: data[0] == 10
+                                              and (9 < data[1:]).all()
+                                              and (data[1:] <= 100).all(),
+                'pos_len_list': [1],
+                'user_len_list': [101],
+            },
+            {
+                'item_id_check': lambda data: data[0] == 1
+                                              and (data[1:] != 1).all(),
+                'pos_len_list': [1],
+                'user_len_list': [101],
+            },
+            {
+                'item_id_check': lambda data: (data[0: 2].numpy() == [19, 20]).all()
+                                              and (18 < data[2:]).all()
+                                              and (data[2:] <= 100).all(),
+                'pos_len_list': [2],
+                'user_len_list': [202],
+            },
+        ]
+        check_result(test_data, test_result)
+
+    def test_general_uni100_dataloader_with_batch_size_in_303(self):
+        train_batch_size = 6
+        eval_batch_size = 303
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'general_uni100_dataloader',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS,uni100',
+            'training_neg_sample_num': 1,
+            'split_ratio': [0.8, 0.1, 0.1],
+            'train_batch_size': train_batch_size,
+            'eval_batch_size': eval_batch_size,
+        }
+        train_data, valid_data, test_data = new_dataloader(config_dict=config_dict)
+
+        def check_result(data, result):
+            assert data.batch_size == 303
+            assert len(data) == len(result)
+            for i, batch_data in enumerate(data):
+                assert result[i]['item_id_check'](batch_data['item_id'])
+                assert batch_data.pos_len_list == result[i]['pos_len_list']
+                assert batch_data.user_len_list == result[i]['user_len_list']
+
+        valid_result = [
+            {
+                'item_id_check': lambda data: data[0] == 9
+                                              and (8 < data[1: 101]).all()
+                                              and (data[1: 101] <= 100).all()
+                                              and data[101] == 1
+                                              and (data[102:202] != 1).all(),
+                'pos_len_list': [1, 1],
+                'user_len_list': [101, 101],
+            },
+            {
+                'item_id_check': lambda data: (data[0: 2].numpy() == [17, 18]).all()
+                                              and (16 < data[2:]).all()
+                                              and (data[2:] <= 100).all(),
+                'pos_len_list': [2],
+                'user_len_list': [202],
+            },
+        ]
+        check_result(valid_data, valid_result)
+
+        test_result = [
+            {
+                'item_id_check': lambda data: data[0] == 10
+                                              and (9 < data[1:101]).all()
+                                              and (data[1:101] <= 100).all()
+                                              and data[101] == 1
+                                              and (data[102:202] != 1).all(),
+                'pos_len_list': [1, 1],
+                'user_len_list': [101, 101],
+            },
+            {
+                'item_id_check': lambda data: (data[0: 2].numpy() == [19, 20]).all()
+                                              and (18 < data[2:]).all()
+                                              and (data[2:] <= 100).all(),
+                'pos_len_list': [2],
+                'user_len_list': [202],
+            },
+        ]
+        check_result(test_data, test_result)
+
+
+if __name__ == '__main__':
+    pytest.main()
diff --git a/tests/data/test_dataset.py b/tests/data/test_dataset.py
new file mode 100644
index 000000000..9b78cbd9d
--- /dev/null
+++ b/tests/data/test_dataset.py
@@ -0,0 +1,570 @@
+# -*- coding: utf-8 -*-
+# @Time   : 2021/1/3
+# @Author : Yushuo Chen
+# @Email  : chenyushuo@ruc.edu.cn
+
+# UPDATE
+# @Time    :   2020/1/3
+# @Author  :   Yushuo Chen
+# @email   :   chenyushuo@ruc.edu.cn
+
+import logging
+import os
+
+import pytest
+
+from recbole.config import Config, EvalSetting
+from recbole.data import create_dataset
+from recbole.utils import init_seed
+
+current_path = os.path.dirname(os.path.realpath(__file__))
+
+
+def new_dataset(config_dict=None, config_file_list=None):
+    config = Config(config_dict=config_dict, config_file_list=config_file_list)
+    init_seed(config['seed'], config['reproducibility'])
+    logging.basicConfig(level=logging.ERROR)
+    return create_dataset(config)
+
+
+def split_dataset(config_dict=None, config_file_list=None):
+    dataset = new_dataset(config_dict=config_dict, config_file_list=config_file_list)
+    config = dataset.config
+    es_str = [_.strip() for _ in config['eval_setting'].split(',')]
+    es = EvalSetting(config)
+    es.set_ordering_and_splitting(es_str[0])
+    return dataset.build(es)
+
+
+class TestDataset:
+    def test_filter_nan_user_or_item(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_nan_user_or_item',
+            'data_path': current_path,
+            'load_col': None,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 1
+        assert len(dataset.user_feat) == 3
+        assert len(dataset.item_feat) == 3
+
+    def test_remove_duplication_by_first(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'remove_duplication',
+            'data_path': current_path,
+            'load_col': None,
+            'rm_dup_inter': 'first',
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.inter_feat[dataset.time_field][0] == 0
+
+    def test_remove_duplication_by_last(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'remove_duplication',
+            'data_path': current_path,
+            'load_col': None,
+            'rm_dup_inter': 'last',
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.inter_feat[dataset.time_field][0] == 2
+
+    def test_filter_by_field_value_with_lowest_val(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'lowest_val': {
+                'timestamp': 4,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 6
+
+    def test_filter_by_field_value_with_highest_val(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'highest_val': {
+                'timestamp': 4,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 5
+
+    def test_filter_by_field_value_with_equal_val(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'equal_val': {
+                'rating': 0,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 3
+
+    def test_filter_by_field_value_with_not_equal_val(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'not_equal_val': {
+                'rating': 4,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 9
+
+    def test_filter_by_field_value_in_same_field(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'lowest_val': {
+                'timestamp': 3,
+            },
+            'highest_val': {
+                'timestamp': 8,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 6
+
+    def test_filter_by_field_value_in_different_field(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_field_value',
+            'data_path': current_path,
+            'load_col': None,
+            'lowest_val': {
+                'timestamp': 3,
+            },
+            'highest_val': {
+                'timestamp': 8,
+            },
+            'not_equal_val': {
+                'rating': 4,
+            }
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 5
+
+    def test_filter_inter_by_user_or_item_is_true(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_inter_by_user_or_item',
+            'data_path': current_path,
+            'load_col': None,
+            'filter_inter_by_user_or_item': True,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 1
+
+    def test_filter_inter_by_user_or_item_is_false(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_inter_by_user_or_item',
+            'data_path': current_path,
+            'load_col': None,
+            'filter_inter_by_user_or_item': False,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 2
+
+    def test_filter_by_inter_num_in_min_user_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'min_user_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 6
+        assert dataset.item_num == 7
+
+    def test_filter_by_inter_num_in_min_item_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 7
+        assert dataset.item_num == 6
+
+    def test_filter_by_inter_num_in_max_user_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'max_user_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 6
+        assert dataset.item_num == 7
+
+    def test_filter_by_inter_num_in_max_item_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'max_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 5
+        assert dataset.item_num == 5
+
+    def test_filter_by_inter_num_in_min_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'min_user_inter_num': 2,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 5
+        assert dataset.item_num == 5
+
+    def test_filter_by_inter_num_in_complex_way(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'max_user_inter_num': 3,
+            'min_user_inter_num': 2,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert dataset.user_num == 3
+        assert dataset.item_num == 3
+
+    def test_rm_dup_by_first_and_filter_value(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'rm_dup_and_filter_value',
+            'data_path': current_path,
+            'load_col': None,
+            'rm_dup_inter': 'first',
+            'highest_val': {
+                'rating': 4,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 1
+
+    def test_rm_dup_by_last_and_filter_value(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'rm_dup_and_filter_value',
+            'data_path': current_path,
+            'load_col': None,
+            'rm_dup_inter': 'last',
+            'highest_val': {
+                'rating': 4,
+            },
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 2
+
+    def test_rm_dup_and_filter_by_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'rm_dup_and_filter_by_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'rm_dup_inter': 'first',
+            'min_user_inter_num': 2,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 4
+        assert dataset.user_num == 3
+        assert dataset.item_num == 3
+
+    def test_filter_value_and_filter_inter_by_ui(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_value_and_filter_inter_by_ui',
+            'data_path': current_path,
+            'load_col': None,
+            'highest_val': {
+                'age': 2,
+            },
+            'not_equal_val': {
+                'price': 2,
+            },
+            'filter_inter_by_user_or_item': True,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 2
+        assert dataset.user_num == 3
+        assert dataset.item_num == 3
+
+    def test_filter_value_and_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_value_and_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'highest_val': {
+                'rating': 0,
+                'age': 0,
+                'price': 0,
+            },
+            'min_user_inter_num': 2,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 4
+        assert dataset.user_num == 3
+        assert dataset.item_num == 3
+
+    def test_filter_inter_by_ui_and_inter_num(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'filter_inter_by_ui_and_inter_num',
+            'data_path': current_path,
+            'load_col': None,
+            'filter_inter_by_user_or_item': True,
+            'min_user_inter_num': 2,
+            'min_item_inter_num': 2,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert len(dataset.inter_feat) == 4
+        assert dataset.user_num == 3
+        assert dataset.item_num == 3
+
+    def test_remap_id(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'remap_id',
+            'data_path': current_path,
+            'load_col': None,
+            'fields_in_same_space': None,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        user_list = dataset.token2id('user_id', ['ua', 'ub', 'uc', 'ud'])
+        item_list = dataset.token2id('item_id', ['ia', 'ib', 'ic', 'id'])
+        assert (user_list == [1, 2, 3, 4]).all()
+        assert (item_list == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['user_id'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['item_id'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['add_user'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['add_item'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['user_list'].numpy() == [[1, 2, 0],
+                                                            [0, 0, 0],
+                                                            [3, 4, 1],
+                                                            [5, 0, 0]]).all()
+
+    def test_remap_id_with_fields_in_same_space(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'remap_id',
+            'data_path': current_path,
+            'load_col': None,
+            'fields_in_same_space': [
+                ['user_id', 'add_user', 'user_list'],
+                ['item_id', 'add_item'],
+            ],
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        user_list = dataset.token2id('user_id', ['ua', 'ub', 'uc', 'ud', 'ue', 'uf'])
+        item_list = dataset.token2id('item_id', ['ia', 'ib', 'ic', 'id', 'ie', 'if'])
+        assert (user_list == [1, 2, 3, 4, 5, 6]).all()
+        assert (item_list == [1, 2, 3, 4, 5, 6]).all()
+        assert (dataset.inter_feat['user_id'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['item_id'].numpy() == [1, 2, 3, 4]).all()
+        assert (dataset.inter_feat['add_user'].numpy() == [2, 5, 4, 6]).all()
+        assert (dataset.inter_feat['add_item'].numpy() == [5, 3, 6, 1]).all()
+        assert (dataset.inter_feat['user_list'].numpy() == [[3, 5, 0],
+                                                            [0, 0, 0],
+                                                            [1, 2, 3],
+                                                            [6, 0, 0]]).all()
+
+    def test_ui_feat_preparation_and_fill_nan(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'ui_feat_preparation_and_fill_nan',
+            'data_path': current_path,
+            'load_col': None,
+            'filter_inter_by_user_or_item': False,
+            'normalize_field': None,
+            'normalize_all': None,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        user_token_list = dataset.id2token('user_id', dataset.user_feat['user_id'])
+        item_token_list = dataset.id2token('item_id', dataset.item_feat['item_id'])
+        assert (user_token_list == ['[PAD]', 'ua', 'ub', 'uc', 'ud', 'ue']).all()
+        assert (item_token_list == ['[PAD]', 'ia', 'ib', 'ic', 'id', 'ie']).all()
+        assert dataset.inter_feat['rating'][3] == 1.0
+        assert dataset.user_feat['age'][4] == 1.5
+        assert dataset.item_feat['price'][4] == 1.5
+        assert (dataset.inter_feat['time_list'].numpy() == [[1., 2., 3.],
+                                                            [2., 0., 0.],
+                                                            [0., 0., 0.],
+                                                            [5., 4., 0.]]).all()
+        assert (dataset.user_feat['profile'].numpy() == [[0, 0, 0],
+                                                         [1, 2, 3],
+                                                         [0, 0, 0],
+                                                         [3, 0, 0],
+                                                         [0, 0, 0],
+                                                         [3, 2, 0]]).all()
+
+    def test_set_label_by_threshold(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'set_label_by_threshold',
+            'data_path': current_path,
+            'load_col': None,
+            'threshold': {
+                'rating': 4,
+            },
+            'normalize_field': None,
+            'normalize_all': None,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert (dataset.inter_feat['label'].numpy() == [1., 0., 1., 0.]).all()
+
+    def test_normalize_all(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'normalize',
+            'data_path': current_path,
+            'load_col': None,
+            'normalize_all': True,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert (dataset.inter_feat['rating'].numpy() == [0., .25, 1., .75, .5]).all()
+        assert (dataset.inter_feat['star'].numpy() == [1., .5, 0., .25, 0.75]).all()
+
+    def test_normalize_field(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'normalize',
+            'data_path': current_path,
+            'load_col': None,
+            'normalize_field': ['rating'],
+            'normalize_all': False,
+        }
+        dataset = new_dataset(config_dict=config_dict)
+        assert (dataset.inter_feat['rating'].numpy() == [0., .25, 1., .75, .5]).all()
+        assert (dataset.inter_feat['star'].numpy() == [4., 2., 0., 1., 3.]).all()
+
+    def test_TO_RS_811(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS',
+            'split_ratio': [0.8, 0.1, 0.1],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert (train_dataset.inter_feat['item_id'].numpy() == list(range(1, 17))).all()
+        assert (valid_dataset.inter_feat['item_id'].numpy() == list(range(17, 19))).all()
+        assert (test_dataset.inter_feat['item_id'].numpy() == list(range(19, 21))).all()
+
+    def test_TO_RS_820(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS',
+            'split_ratio': [0.8, 0.2, 0.0],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert (train_dataset.inter_feat['item_id'].numpy() == list(range(1, 17))).all()
+        assert (valid_dataset.inter_feat['item_id'].numpy() == list(range(17, 21))).all()
+        assert len(test_dataset.inter_feat) == 0
+
+    def test_TO_RS_802(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_RS',
+            'split_ratio': [0.8, 0.0, 0.2],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert (train_dataset.inter_feat['item_id'].numpy() == list(range(1, 17))).all()
+        assert len(valid_dataset.inter_feat) == 0
+        assert (test_dataset.inter_feat['item_id'].numpy() == list(range(17, 21))).all()
+
+    def test_TO_LS(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'TO_LS',
+            'leave_one_num': 2,
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert (train_dataset.inter_feat['item_id'].numpy() == list(range(1, 19))).all()
+        assert (valid_dataset.inter_feat['item_id'].numpy() == list(range(19, 20))).all()
+        assert (test_dataset.inter_feat['item_id'].numpy() == list(range(20, 21))).all()
+
+    def test_RO_RS_811(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'RO_RS',
+            'split_ratio': [0.8, 0.1, 0.1],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert len(train_dataset.inter_feat) == 16
+        assert len(valid_dataset.inter_feat) == 2
+        assert len(test_dataset.inter_feat) == 2
+
+    def test_TO_RS_820(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'RO_RS',
+            'split_ratio': [0.8, 0.2, 0.0],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert len(train_dataset.inter_feat) == 16
+        assert len(valid_dataset.inter_feat) == 4
+        assert len(test_dataset.inter_feat) == 0
+
+    def test_RO_RS_802(self):
+        config_dict = {
+            'model': 'BPR',
+            'dataset': 'build_dataset',
+            'data_path': current_path,
+            'load_col': None,
+            'eval_setting': 'RO_RS',
+            'split_ratio': [0.8, 0.0, 0.2],
+        }
+        train_dataset, valid_dataset, test_dataset = split_dataset(config_dict=config_dict)
+        assert len(train_dataset.inter_feat) == 16
+        assert len(valid_dataset.inter_feat) == 0
+        assert len(test_dataset.inter_feat) == 4
+
+
+if __name__ == "__main__":
+    pytest.main()
diff --git a/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.inter b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.inter
new file mode 100644
index 000000000..ea5f045a5
--- /dev/null
+++ b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.inter
@@ -0,0 +1,5 @@
+user_id:token	item_id:token	rating:float	time_list:float_seq
+ua	ia	0	1 2 3
+ub	ib	1	2
+uc	ic	2	
+ud	id		5 4
\ No newline at end of file
diff --git a/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.item b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.item
new file mode 100644
index 000000000..00c3fd7cd
--- /dev/null
+++ b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.item
@@ -0,0 +1,5 @@
+item_id:token	price:float
+ia	0
+ib	1
+ic	2
+ie	3
\ No newline at end of file
diff --git a/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.user b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.user
new file mode 100644
index 000000000..6baab598f
--- /dev/null
+++ b/tests/data/ui_feat_preparation_and_fill_nan/ui_feat_preparation_and_fill_nan.user
@@ -0,0 +1,5 @@
+user_id:token	age:float	profile:token_seq
+ua	0	a b c
+ub	1	
+uc	2	c
+ue	3	c b
\ No newline at end of file
diff --git a/tests/model/test_model_auto.py b/tests/model/test_model_auto.py
index 7c2b3b308..b71f91897 100644
--- a/tests/model/test_model_auto.py
+++ b/tests/model/test_model_auto.py
@@ -6,7 +6,7 @@
 # UPDATE
 # @Time    :   2020/11/17
 # @Author  :   Xingyu Pan
-# @email   :   panxy@ruc.edu.cn  
+# @email   :   panxy@ruc.edu.cn
 
 import os
 import unittest
@@ -140,6 +140,12 @@ def test_line(self):
         }
         quick_test(config_dict)
 
+    def test_ease(self):
+        config_dict = {
+            'model': 'EASE',
+        }
+        quick_test(config_dict)
+
     def test_MultiDAE(self):
         config_dict = {
             'model': 'MultiDAE',
@@ -167,13 +173,21 @@ def test_MacridVAE(self):
             'training_neg_sample_num': 0
         }
         quick_test(config_dict)
- 
+
     def test_CDAE(self):
         config_dict = {
             'model': 'CDAE',
             'training_neg_sample_num': 0
         }
         quick_test(config_dict)
+        
+    def test_NNCF(self):
+        config_dict = {
+            'model': 'NNCF',
+        }
+        quick_test(config_dict)
+        
+
 
 class TestContextRecommender(unittest.TestCase):
     # todo: more complex context information should be test, such as criteo dataset
diff --git a/time_test_result/General_recommendation.md b/time_test_result/General_recommendation.md
deleted file mode 100644
index 7e6c40b75..000000000
--- a/time_test_result/General_recommendation.md
+++ /dev/null
@@ -1,71 +0,0 @@
-## Training and testing time of general recommendation models 
-
-### Datasets information:
-
-| Dataset | #User   | #Item  | #Interaction | Sparsity |
-| ------- | ------- | ------ | ------------ | -------- |
-| ml-1m   | 6,041   | 3,707  | 1,000,209    | 0.9553   |
-| Netflix | 80,476  | 16,821 | 1,977,844    | 0.9985   |
-| Yelp    | 102,046 | 98,408 | 2,903,648    | 0.9997   |
-
-### 1) ml-1m dataset:
-
-#### Time and memory cost on ml-1m dataset:
-
-| Method     | Training Time (s) | Evaluate Time (s) | Memory (MB) |
-| ---------- | ----------------- | ----------------- | ----------- |
-| Popularity | 2.11              | 8.08              | 843         |
-| ItemKNN    | 2                 | 11.76             | 843         |
-| BPRMF      | 1.93              | 7.43              | 931         |
-| NeuMF      | 4.94              | 13.12             | 965         |
-| DMF        | 4.47              | 12.63             | 1555        |
-| NAIS       | 59.27             | 24.41             | 22351       |
-| NGCF       | 12.09             | 7.12              | 1231        |
-| GCMC       | 9.04              | 54.15             | 1353        |
-| LightGCN   | 7.83              | 7.47              | 1177        |
-| DGCF       | 181.66            | 8.06              | 6745        |
-| ConvNCF    | 8.46              | 19.6              | 1341        |
-| FISM       | 19.3              | 10.92             | 7109        |
-| SpectralCF | 13.87             | 6.97              | 1219        |
-
-#### Config file of ml-1m:
-
-```
-# dataset config
-field_separator: "\t"
-seq_separator: " "
-USER_ID_FIELD: user_id
-ITEM_ID_FIELD: item_id
-RATING_FIELD: rating
-TIME_FIELD: timestamp
-LABEL_FIELD: label
-NEG_PREFIX: neg_
-load_col:
-  inter: [user_id, item_id, rating, timestamp]
-min_user_inter_num: 0
-min_item_inter_num: 0
-
-
-# training and evaluation
-epochs: 500
-train_batch_size: 2048
-eval_batch_size: 2048
-valid_metric: MRR@10
-```
-
-
-
-### Time and memory cost on Netflix dataset:
-
-
-
-### Time and memory cost on Yelp dataset:
-
-
-
-
-
-
-
-
-