RUCAIBox · chenyushuo · Aug 24, 2021 · Aug 23, 2021 · Aug 23, 2021 · Aug 24, 2021
diff --git a/conda/meta.yaml b/conda/meta.yaml
@@ -9,26 +9,26 @@ requirements:
   build:
     - python
   host:
-    - python
+    - python >=3.6
     - numpy >=1.17.2
     - scipy ==1.6.0
     - pandas >=1.0.5
     - tqdm >=4.48.2
     - pyyaml >=5.1.0
     - scikit-learn >=0.23.2
-    - pytorch
+    - pytorch >=1.7.0
     - colorlog==4.7.2
     - colorama==0.4.4
     - tensorboard >=2.5.0
   run:
-    - python
+    - python >=3.6
     - numpy >=1.17.2
     - scipy ==1.6.0
     - pandas >=1.0.5
     - tqdm >=4.48.2
     - pyyaml >=5.1.0
     - scikit-learn >=0.23.2
-    - pytorch
+    - pytorch >=1.7.0
     - colorlog==4.7.2
     - colorama==0.4.4
     - tensorboard >=2.5.0

diff --git a/docs/source/asset/framework.png b/docs/source/asset/framework.png
diff --git a/docs/source/asset/logo.png b/docs/source/asset/logo.png
diff --git a/docs/source/asset/tensorboard_1.png b/docs/source/asset/tensorboard_1.png
diff --git a/docs/source/asset/tensorboard_2.png b/docs/source/asset/tensorboard_2.png
diff --git a/docs/source/developer_guide/customize_samplers.rst b/docs/source/developer_guide/customize_samplers.rst
@@ -1,8 +1,22 @@
 Customize Samplers
 ======================
+In RecBole, sampler module is designed to select negative items for training and evaluation.
+
 Here we present how to develop a new sampler, and apply it into RecBole.
 The new sampler is used when we need complex sampling method.
 
+In RecBole, we now only support two kinds of sampling strategies: **random negative sampling (RNS)** and **popularity-biased negative sampling (PNS)**.
+RNS is to select the negative items in uniform distribution, and PNS is to select the negative item in a popularity-biased distribution. 
+For PNS, we set the popularity-biased distribution based on the total number of items' interactions.
+
+In our framework, if you want to create a new sampler, you need to inherit the :class:`~recbole.sampler.sampler.AbstractSampler`, implement
+:obj:`__init__()`,  , 
+:meth:`~recbole.sampler.sampler.KGSampler.__init__()`, 
+, rewrite three functions: :obj: `_uni_sampling()`, 
+:obj: `._get_candidates_list()`, :obj: `get_used_ids()` 
+and create a new sampling function.  
+
+
 Here, we take the :class:`~recbole.sampler.sampler.KGSampler` as an example.
 
 
@@ -37,49 +51,57 @@ where we only need to invoke :obj:`super.__init__(distribution)`.
 
         super().__init__(distribution=distribution)
 
+Implement _uni_sampling()
+-------------------------------
+To implement the RNS for KGSampler, we need to rewrite the `:meth:`~recbole.sampler.sampler.AbstractSampler._uni_sampling`.
+Here we use the :obj:`numpy.random.randint()` to help us randomly select the ``entity_id``. This function will return the 
+selected samples' id (here is ``entity_id``).
 
-Implement get_random_list()
-------------------------------
-We do not use the random function in python or numpy due to their lower efficiency.
-Instead, we realize our own :meth:`~recbole.sampler.sampler.AbstractSampler.random` function, where the key method is to combine the random list with the pointer.
-The pointer point to some element in the random list. When one calls :meth:`self.random`, the element is returned, and moves the pointer backward by one element.
-If the pointer point to the last element, then it will return to the head of the element.
+Example code:
+
+.. code:: python
 
-In :class:`~recbole.sampler.sampler.AbstractSampler`, the :meth:`~recbole.sampler.sampler.AbstractSampler.__init__` will call :meth:`~recbole.sampler.sampler.AbstractSampler.get_random_list`, and shuffle the results.
-We only need to return a list including all the elements.
+    def _uni_sampling(self, sample_num):
+        return np.random.randint(1, self.entity_num, sample_num)
 
-It should be noted ``0`` can be the token used for padding, thus one should remain this value.
+Implement _get_candidates_list()
+-------------------------------------
+To implement PNS for KGSampler, we need to rewrite the `:meth:`~recbole.sampler.sampler.AbstractSampler._get_candidates_list`.
+This function is used to get a candidate list for PNS, and we will set the sampling distribution based on 
+:obj:`Counter(candidate_list)`. This function will return a list of candidates' id.
 
 Example code:
 
-.. code:: python
+..code:: python
 
-    def get_random_list(self):
-        if self.distribution == 'uniform':
-            return list(range(1, self.entity_num))
-        elif self.distribution == 'popularity':
-            return list(self.hid_list) + list(self.tid_list)
-        else:
-            raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+    def _get_candidates_list(self):
+        return list(self.hid_list) + list(self.tid_list)
 
 
 Implement get_used_ids()
 ----------------------------
-For negative sampling, we do not want to sample positive instance, this function is used to compute the positive sample.
+For negative sampling, we do not want to sample positive instance, this function is used to record the positive sample.
 The function will return numpy, and the index is the ID. The return value will be saved in :attr:`self.used_ids`.
 
 Example code:
 
 .. code:: python
 
     def get_used_ids(self):
-        used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
+       used_tail_entity_id = np.array([set() for _ in range(self.entity_num)])
         for hid, tid in zip(self.hid_list, self.tid_list):
             used_tail_entity_id[hid].add(tid)
+
+        for used_tail_set in used_tail_entity_id:
+            if len(used_tail_set) + 1 == self.entity_num:  # [pad] is a entity.
+                raise ValueError(
+                    'Some head entities have relation with all entities, '
+                    'which we can not sample negative entities for them.'
+                )
         return used_tail_entity_id
 
 
-Implementing the sampling function
+Implement the sampling function
 -----------------------------------
 In :class:`~recbole.sampler.sampler.AbstractSampler`, we have implemented :meth:`~recbole.sampler.sampler.AbstractSampler.sample_by_key_ids` function,
 where we have three parameters: :attr:`key_ids`, :attr:`num` and :attr:`used_ids`.
@@ -109,12 +131,6 @@ Complete Code
 .. code:: python
 
     class KGSampler(AbstractSampler):
-        """:class:`KGSampler` is used to sample negative entities in a knowledge graph.
-
-        Args:
-            dataset (Dataset): The knowledge graph dataset, which contains triplets in a knowledge graph.
-            distribution (str, optional): Distribution of the negative entities. Defaults to 'uniform'.
-        """
         def __init__(self, dataset, distribution='uniform'):
             self.dataset = dataset
 
@@ -128,47 +144,31 @@ Complete Code
 
             super().__init__(distribution=distribution)
 
-        def get_random_list(self):
-            """
-            Returns:
-                np.ndarray or list: Random list of entity_id.
-            """
-            if self.distribution == 'uniform':
-                return list(range(1, self.entity_num))
-            elif self.distribution == 'popularity':
-                return list(self.hid_list) + list(self.tid_list)
-            else:
-                raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
+        def _uni_sampling(self, sample_num):
+            return np.random.randint(1, self.entity_num, sample_num)
+
+        def _get_candidates_list(self):
+            return list(self.hid_list) + list(self.tid_list)
 
         def get_used_ids(self):
-            """
-            Returns:
-                np.ndarray: Used entity_ids is the same as tail_entity_ids in knowledge graph.
-                Index is head_entity_id, and element is a set of tail_entity_ids.
-            """
-            used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
+            used_tail_entity_id = np.array([set() for _ in range(self.entity_num)])
             for hid, tid in zip(self.hid_list, self.tid_list):
                 used_tail_entity_id[hid].add(tid)
+
+            for used_tail_set in used_tail_entity_id:
+                if len(used_tail_set) + 1 == self.entity_num:  # [pad] is a entity.
+                    raise ValueError(
+                        'Some head entities have relation with all entities, '
+                        'which we can not sample negative entities for them.'
+                    )
             return used_tail_entity_id
 
         def sample_by_entity_ids(self, head_entity_ids, num=1):
-            """Sampling by head_entity_ids.
-
-            Args:
-                head_entity_ids (np.ndarray or list): Input head_entity_ids.
-                num (int, optional): Number of sampled entity_ids for each head_entity_id. Defaults to ``1``.
-
-            Returns:
-                np.ndarray: Sampled entity_ids.
-                entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], ...,
-                entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0];
-                entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], ...,
-                entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; ...; and so on.
-            """
             try:
-                return self.sample_by_key_ids(head_entity_ids, num, self.used_ids[head_entity_ids])
+                return self.sample_by_key_ids(head_entity_ids, num)
             except IndexError:
                 for head_entity_id in head_entity_ids:
                     if head_entity_id not in self.head_entities:
-                        raise ValueError('head_entity_id [{}] not exist'.format(head_entity_id))
+                        raise ValueError(f'head_entity_id [{head_entity_id}] not exist.')
+
 
diff --git a/docs/source/get_started/install.rst b/docs/source/get_started/install.rst
@@ -11,7 +11,7 @@ RecBole is compatible with the following operating systems:
 * Windows 10
 * macOS X
 
-Python 3.6 (or later), torch 1.6.0 (or later) are required to install our library. If you want to use RecBole with GPU,
+Python 3.6 (or later), torch 1.7.0 (or later) are required to install our library. If you want to use RecBole with GPU,
 please ensure that CUDA or CUDAToolkit version is 9.2 or later.
 This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).
 
@@ -54,3 +54,62 @@ Run the following command to install:
 .. code:: bash
 
     pip install -e . --verbose
+
+Try to run：
+-------------------------
+To check if you have successfully installed the RecBole, you can create a new python file (e.g., `run.py`),
+and write the following code:
+
+.. code:: python
+
+    from recbole.quick_start import run_recbole
+
+    run_recbole(model='BPR', dataset='ml-100k')
+
+
+Then run the following command:
+
+.. code:: bash
+
+    python run.py
+
+This will perform the training and test of the BPR model on the ml-100k dataset, and you will obtain some output like:
+
+.. code:: none
+
+    05 Aug 02:16    INFO  ml-100k
+    The number of users: 944
+    Average actions of users: 106.04453870625663
+    The number of items: 1683
+    Average actions of items: 59.45303210463734
+    The number of inters: 100000
+    The sparsity of the dataset: 93.70575143257098%
+    Remain Fields: ['user_id', 'item_id', 'rating', 'timestamp']
+    05 Aug 02:16    INFO  [Training]: train_batch_size = [2048] negative sampling: [{'uniform': 1}]
+    05 Aug 02:16    INFO  [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}]
+    05 Aug 02:16    INFO  BPR(
+    (user_embedding): Embedding(944, 64)
+    (item_embedding): Embedding(1683, 64)
+    (loss): BPRLoss()
+    )
+    Trainable parameters: 168128
+    Train     0: 100%|████████████████████████| 40/40 [00:00<00:00, 219.54it/s, GPU RAM: 0.01 G/11.91 G]
+    05 Aug 02:16    INFO  epoch 0 training [time: 0.19s, train loss: 27.7228]
+    Evaluate   : 100%|██████████████████████| 472/472 [00:00<00:00, 506.11it/s, GPU RAM: 0.01 G/11.91 G]
+    05 Aug 02:16    INFO  epoch 0 evaluating [time: 0.94s, valid_score: 0.020500]
+    05 Aug 02:16    INFO  valid result: 
+    recall@10 : 0.0067    mrr@10 : 0.0205    ndcg@10 : 0.0086    hit@10 : 0.0732    precision@10 : 0.0081    
+
+    ...
+
+    Train    96: 100%|████████████████████████| 40/40 [00:00<00:00, 230.65it/s, GPU RAM: 0.01 G/11.91 G]
+    05 Aug 02:19    INFO  epoch 96 training [time: 0.18s, train loss: 3.7170]
+    Evaluate   : 100%|██████████████████████| 472/472 [00:00<00:00, 800.46it/s, GPU RAM: 0.01 G/11.91 G]
+    05 Aug 02:19    INFO  epoch 96 evaluating [time: 0.60s, valid_score: 0.375200]
+    05 Aug 02:19    INFO  valid result: 
+    recall@10 : 0.2162    mrr@10 : 0.3752    ndcg@10 : 0.2284    hit@10 : 0.7508    precision@10 : 0.1602    
+    05 Aug 02:19    INFO  Finished training, best eval result in epoch 85
+    05 Aug 02:19    INFO  Loading model structure and parameters from saved/BPR-Aug-05-2021_02-17-51.pth
+    Evaluate   : 100%|██████████████████████| 472/472 [00:00<00:00, 832.85it/s, GPU RAM: 0.01 G/11.91 G]
+    05 Aug 02:19    INFO  best valid : {'recall@10': 0.2195, 'mrr@10': 0.3871, 'ndcg@10': 0.2344, 'hit@10': 0.7582, 'precision@10': 0.1627}
+    05 Aug 02:19    INFO  test result: {'recall@10': 0.2523, 'mrr@10': 0.4855, 'ndcg@10': 0.292, 'hit@10': 0.7953, 'precision@10': 0.1962}
diff --git a/docs/source/get_started/introduction.rst b/docs/source/get_started/introduction.rst