RUCAIBox · chenyushuo · Aug 24, 2021 · Aug 24, 2021 · Aug 24, 2021 · Aug 24, 2021
diff --git a/docs/source/user_guide/config/evaluation_settings.rst b/docs/source/user_guide/config/evaluation_settings.rst
@@ -12,19 +12,23 @@ Evaluation settings are designed to set parameters about model evaluation.
 
   - ``order (str)``: decides how we sort the data in `.inter`. Now we support two kinds of ordering strategies: ``['RO', 'TO']``, which denotes the random ordering and temporal ordering. For ``RO``, we will shuffle the data and then split them in this order. For ``TO``, we will sort the data by the column of `TIME_FIELD` in ascending order and the split them in this order. The default value is `RO`.
 
-  - ``split (dict)``: decides how we split the data in `.inter`. Now we support two kinds of splitting strategies: ``['RS','LS']``, which denotes the ratio-based data splitting and leave-one-out data splitting. If the key of ``split`` is ``RS``, you need to set the splitting ratio like ``[0.8,0.1,0.1]``,``[7,2,1]`` or ``[8,0,2]``, which denotes the ratio of training set, validation set and testing set respectively. If the key of split is ``LS``, now we support three kinds of ``LS`` mode: ``['valid_and_test', 'valid_only', 'test_only']`` and you should choose one mode as the value of `LS`.  The default value of `split` is ``{'RS': [0.8,0.1,0.1]}``.
+  - ``split (dict)``: decides how we split the data in `.inter`. Now we support two kinds of splitting strategies: ``['RS','LS']``, which denotes the ratio-based data splitting and leave-one-out data splitting. If the key of ``split`` is ``RS``, you need to set the splitting ratio like ``[0.8,0.1,0.1]``, ``[7,2,1]`` or ``[8,0,2]``, which denotes the ratio of training set, validation set and testing set respectively. If the key of split is ``LS``, now we support three kinds of ``LS`` mode: ``['valid_and_test', 'valid_only', 'test_only']`` and you should choose one mode as the value of `LS`.  The default value of `split` is ``{'RS': [0.8,0.1,0.1]}``.
 
   - ``mode (str)``: decides the data range which we evaluate the model on. Now we support four kinds of evaluation mode: ``['full','unixxx','popxxx','labeled']``. ``full`` , ``unixxx`` and ``popxxx`` are designed for the evaluation on implicit feedback (data without label). For implicit feedback, we regard the items with observed interactions as positive items and those without observed interactions as negative items. ``full`` means evaluating the model on the set of all items. ``unixxx``, for example ``uni100``,  means uniformly sample 100 negative items for each positive item in testing set, and evaluate the model on these positive items with their sampled negative items. ``popxxx``, for example ``pop100``, means sample 100 negative items for each positive item in testing set based on item popularity (:obj:`Counter(item)` in `.inter` file), and evaluate the model on these positive items with their sampled negative items. Here the `xxx` must be an integer. For explicit feedback (data with label), you should set the mode as ``None`` and we will evaluate the model based on your label. The default value is ``full``.
 
-- ``repeatable (bool)``: Whether to evaluate the result with a repeatable recommendation scene.
-    Note that it is disabled for sequential models as the recommendation is already repeatable.
-    For other models, defaults to ``False``.
+- ``repeatable (bool)``: Whether to evaluate the result with a repeatable recommendation scene. Note that it is disabled for sequential models as the recommendation is already repeatable. For other models, defaults to ``False``.
 - ``metrics (list or str)``: Evaluation metrics. Defaults to
-  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in
-  ``[''Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC',
-  'MAE', 'RMSE', 'LogLoss', 'ItemCoverage', 'AveragePopularity', 
-  'GiniIndex','ShannonEntropy','TailPercentage' ]``. Note that value-based 
-  metrics and ranking-based metrics can not be used together.
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in the following table:
+
+  ==============    =================================================
+  Type              Metrics 
+  ==============    =================================================
+  Ranking-based     Recall, MRR, NDCG, Hit, MAP, Precision, GAUC，ItemCoverage, AveragePopularity, GiniIndex, ShannonEntropy, TailPercentage
+  value-based       AUC, MAE, RMSE, LogLoss      
+  ==============    =================================================
+
+  Note that value-based metrics and ranking-based metrics can not be used together.
+
 - ``topk (list or int or None)``: The value of k for topk evaluation metrics.
   Defaults to ``10``.
 - ``valid_metric (str)``: The evaluation metrics for early stopping. 

diff --git a/docs/source/user_guide/train_eval_intro.rst b/docs/source/user_guide/train_eval_intro.rst
@@ -62,9 +62,7 @@ The parameters used to control the evaluation method are as follows:
   - ``mode (str)``: Control different candidates of ranking.
     Range in ``[labeled, full,unixxx,popxxx]`` and defaults to ``full``.
 
-- ``repeatable (bool)``: Whether to evaluate the result with a repeatable recommendation scene.
-    Note that it is disabled for sequential models as the recommendation is already repeatable.
-    For other models, defaults to ``False``.
+- ``repeatable (bool)``: Whether to evaluate the result with a repeatable recommendation scene. Note that it is disabled for sequential models as the recommendation is already repeatable. For other models, defaults to ``False``.
 
 Evaluation metrics
 >>>>>>>>>>>>>>>>>>>>>>>>>>
@@ -86,12 +84,17 @@ More details about metrics can refer to :doc:`/recbole/recbole.evaluator.metrics
 The parameters used to control the evaluation metrics are as follows:
 
 - ``metrics (list or str)``: Evaluation metrics. Defaults to
-  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in
-  ``['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'AUC',
-  'MAE', 'RMSE', 'LogLoss', 'ItemCoverage', 'AveragePopularity',
-   'GiniIndex','ShannonEntropy','TailPercentage']``.
-   Note that value-based metrics and ranking-based metrics can not be used together.
+  ``['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']``. Range in the following table:
+
+  ==============    =================================================
+  Type              Metrics 
+  ==============    =================================================
+  Ranking-based     Recall, MRR, NDCG, Hit, MAP, Precision, GAUC，ItemCoverage, AveragePopularity, GiniIndex, ShannonEntropy, TailPercentage
+  value-based       AUC, MAE, RMSE, LogLoss      
+  ==============    =================================================
+
+  Note that value-based metrics and ranking-based metrics can not be used together.
 - ``topk (list or int or None)``: The value of k for topk evaluation metrics.
   Defaults to ``10``.
 
-For more details about evaluation settings, please read :doc:`config/evaluation_settings`
+For more details about evaluation settings, please read :doc:`config/evaluation_settings`