Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA: update docs #940

Merged
merged 9 commits into from
Aug 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,26 @@ requirements:
build:
- python
host:
- python
- python >=3.6
- numpy >=1.17.2
- scipy ==1.6.0
- pandas >=1.0.5
- tqdm >=4.48.2
- pyyaml >=5.1.0
- scikit-learn >=0.23.2
- pytorch
- pytorch >=1.7.0
- colorlog==4.7.2
- colorama==0.4.4
- tensorboard >=2.5.0
run:
- python
- python >=3.6
- numpy >=1.17.2
- scipy ==1.6.0
- pandas >=1.0.5
- tqdm >=4.48.2
- pyyaml >=5.1.0
- scikit-learn >=0.23.2
- pytorch
- pytorch >=1.7.0
- colorlog==4.7.2
- colorama==0.4.4
- tensorboard >=2.5.0
Expand Down
Binary file added docs/source/asset/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/tensorboard_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/tensorboard_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
116 changes: 58 additions & 58 deletions docs/source/developer_guide/customize_samplers.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,22 @@
Customize Samplers
======================
In RecBole, sampler module is designed to select negative items for training and evaluation.

Here we present how to develop a new sampler, and apply it into RecBole.
The new sampler is used when we need complex sampling method.

In RecBole, we now only support two kinds of sampling strategies: **random negative sampling (RNS)** and **popularity-biased negative sampling (PNS)**.
RNS is to select the negative items in uniform distribution, and PNS is to select the negative item in a popularity-biased distribution.
For PNS, we set the popularity-biased distribution based on the total number of items' interactions.

In our framework, if you want to create a new sampler, you need to inherit the :class:`~recbole.sampler.sampler.AbstractSampler`, implement
:obj:`__init__()`, ,
:meth:`~recbole.sampler.sampler.KGSampler.__init__()`,
, rewrite three functions: :obj: `_uni_sampling()`,
:obj: `._get_candidates_list()`, :obj: `get_used_ids()`
and create a new sampling function.


Here, we take the :class:`~recbole.sampler.sampler.KGSampler` as an example.


Expand Down Expand Up @@ -37,49 +51,57 @@ where we only need to invoke :obj:`super.__init__(distribution)`.

super().__init__(distribution=distribution)

Implement _uni_sampling()
-------------------------------
To implement the RNS for KGSampler, we need to rewrite the `:meth:`~recbole.sampler.sampler.AbstractSampler._uni_sampling`.
Here we use the :obj:`numpy.random.randint()` to help us randomly select the ``entity_id``. This function will return the
selected samples' id (here is ``entity_id``).

Implement get_random_list()
------------------------------
We do not use the random function in python or numpy due to their lower efficiency.
Instead, we realize our own :meth:`~recbole.sampler.sampler.AbstractSampler.random` function, where the key method is to combine the random list with the pointer.
The pointer point to some element in the random list. When one calls :meth:`self.random`, the element is returned, and moves the pointer backward by one element.
If the pointer point to the last element, then it will return to the head of the element.
Example code:

.. code:: python

In :class:`~recbole.sampler.sampler.AbstractSampler`, the :meth:`~recbole.sampler.sampler.AbstractSampler.__init__` will call :meth:`~recbole.sampler.sampler.AbstractSampler.get_random_list`, and shuffle the results.
We only need to return a list including all the elements.
def _uni_sampling(self, sample_num):
return np.random.randint(1, self.entity_num, sample_num)

It should be noted ``0`` can be the token used for padding, thus one should remain this value.
Implement _get_candidates_list()
-------------------------------------
To implement PNS for KGSampler, we need to rewrite the `:meth:`~recbole.sampler.sampler.AbstractSampler._get_candidates_list`.
This function is used to get a candidate list for PNS, and we will set the sampling distribution based on
:obj:`Counter(candidate_list)`. This function will return a list of candidates' id.

Example code:

.. code:: python
..code:: python

def get_random_list(self):
if self.distribution == 'uniform':
return list(range(1, self.entity_num))
elif self.distribution == 'popularity':
return list(self.hid_list) + list(self.tid_list)
else:
raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
def _get_candidates_list(self):
return list(self.hid_list) + list(self.tid_list)


Implement get_used_ids()
----------------------------
For negative sampling, we do not want to sample positive instance, this function is used to compute the positive sample.
For negative sampling, we do not want to sample positive instance, this function is used to record the positive sample.
The function will return numpy, and the index is the ID. The return value will be saved in :attr:`self.used_ids`.

Example code:

.. code:: python

def get_used_ids(self):
used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
used_tail_entity_id = np.array([set() for _ in range(self.entity_num)])
for hid, tid in zip(self.hid_list, self.tid_list):
used_tail_entity_id[hid].add(tid)

for used_tail_set in used_tail_entity_id:
if len(used_tail_set) + 1 == self.entity_num: # [pad] is a entity.
raise ValueError(
'Some head entities have relation with all entities, '
'which we can not sample negative entities for them.'
)
return used_tail_entity_id


Implementing the sampling function
Implement the sampling function
-----------------------------------
In :class:`~recbole.sampler.sampler.AbstractSampler`, we have implemented :meth:`~recbole.sampler.sampler.AbstractSampler.sample_by_key_ids` function,
where we have three parameters: :attr:`key_ids`, :attr:`num` and :attr:`used_ids`.
Expand Down Expand Up @@ -109,12 +131,6 @@ Complete Code
.. code:: python

class KGSampler(AbstractSampler):
""":class:`KGSampler` is used to sample negative entities in a knowledge graph.

Args:
dataset (Dataset): The knowledge graph dataset, which contains triplets in a knowledge graph.
distribution (str, optional): Distribution of the negative entities. Defaults to 'uniform'.
"""
def __init__(self, dataset, distribution='uniform'):
self.dataset = dataset

Expand All @@ -128,47 +144,31 @@ Complete Code

super().__init__(distribution=distribution)

def get_random_list(self):
"""
Returns:
np.ndarray or list: Random list of entity_id.
"""
if self.distribution == 'uniform':
return list(range(1, self.entity_num))
elif self.distribution == 'popularity':
return list(self.hid_list) + list(self.tid_list)
else:
raise NotImplementedError('Distribution [{}] has not been implemented'.format(self.distribution))
def _uni_sampling(self, sample_num):
return np.random.randint(1, self.entity_num, sample_num)

def _get_candidates_list(self):
return list(self.hid_list) + list(self.tid_list)

def get_used_ids(self):
"""
Returns:
np.ndarray: Used entity_ids is the same as tail_entity_ids in knowledge graph.
Index is head_entity_id, and element is a set of tail_entity_ids.
"""
used_tail_entity_id = np.array([set() for i in range(self.entity_num)])
used_tail_entity_id = np.array([set() for _ in range(self.entity_num)])
for hid, tid in zip(self.hid_list, self.tid_list):
used_tail_entity_id[hid].add(tid)

for used_tail_set in used_tail_entity_id:
if len(used_tail_set) + 1 == self.entity_num: # [pad] is a entity.
raise ValueError(
'Some head entities have relation with all entities, '
'which we can not sample negative entities for them.'
)
return used_tail_entity_id

def sample_by_entity_ids(self, head_entity_ids, num=1):
"""Sampling by head_entity_ids.

Args:
head_entity_ids (np.ndarray or list): Input head_entity_ids.
num (int, optional): Number of sampled entity_ids for each head_entity_id. Defaults to ``1``.

Returns:
np.ndarray: Sampled entity_ids.
entity_ids[0], entity_ids[len(head_entity_ids)], entity_ids[len(head_entity_ids) * 2], ...,
entity_id[len(head_entity_ids) * (num - 1)] is sampled for head_entity_ids[0];
entity_ids[1], entity_ids[len(head_entity_ids) + 1], entity_ids[len(head_entity_ids) * 2 + 1], ...,
entity_id[len(head_entity_ids) * (num - 1) + 1] is sampled for head_entity_ids[1]; ...; and so on.
"""
try:
return self.sample_by_key_ids(head_entity_ids, num, self.used_ids[head_entity_ids])
return self.sample_by_key_ids(head_entity_ids, num)
except IndexError:
for head_entity_id in head_entity_ids:
if head_entity_id not in self.head_entities:
raise ValueError('head_entity_id [{}] not exist'.format(head_entity_id))
raise ValueError(f'head_entity_id [{head_entity_id}] not exist.')


61 changes: 60 additions & 1 deletion docs/source/get_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RecBole is compatible with the following operating systems:
* Windows 10
* macOS X

Python 3.6 (or later), torch 1.6.0 (or later) are required to install our library. If you want to use RecBole with GPU,
Python 3.6 (or later), torch 1.7.0 (or later) are required to install our library. If you want to use RecBole with GPU,
please ensure that CUDA or CUDAToolkit version is 9.2 or later.
This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).

Expand Down Expand Up @@ -54,3 +54,62 @@ Run the following command to install:
.. code:: bash

pip install -e . --verbose

Try to run:
-------------------------
To check if you have successfully installed the RecBole, you can create a new python file (e.g., `run.py`),
and write the following code:

.. code:: python

from recbole.quick_start import run_recbole

run_recbole(model='BPR', dataset='ml-100k')


Then run the following command:

.. code:: bash

python run.py

This will perform the training and test of the BPR model on the ml-100k dataset, and you will obtain some output like:

.. code:: none

05 Aug 02:16 INFO ml-100k
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['user_id', 'item_id', 'rating', 'timestamp']
05 Aug 02:16 INFO [Training]: train_batch_size = [2048] negative sampling: [{'uniform': 1}]
05 Aug 02:16 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}]
05 Aug 02:16 INFO BPR(
(user_embedding): Embedding(944, 64)
(item_embedding): Embedding(1683, 64)
(loss): BPRLoss()
)
Trainable parameters: 168128
Train 0: 100%|████████████████████████| 40/40 [00:00<00:00, 219.54it/s, GPU RAM: 0.01 G/11.91 G]
05 Aug 02:16 INFO epoch 0 training [time: 0.19s, train loss: 27.7228]
Evaluate : 100%|██████████████████████| 472/472 [00:00<00:00, 506.11it/s, GPU RAM: 0.01 G/11.91 G]
05 Aug 02:16 INFO epoch 0 evaluating [time: 0.94s, valid_score: 0.020500]
05 Aug 02:16 INFO valid result:
recall@10 : 0.0067 mrr@10 : 0.0205 ndcg@10 : 0.0086 hit@10 : 0.0732 precision@10 : 0.0081

...

Train 96: 100%|████████████████████████| 40/40 [00:00<00:00, 230.65it/s, GPU RAM: 0.01 G/11.91 G]
05 Aug 02:19 INFO epoch 96 training [time: 0.18s, train loss: 3.7170]
Evaluate : 100%|██████████████████████| 472/472 [00:00<00:00, 800.46it/s, GPU RAM: 0.01 G/11.91 G]
05 Aug 02:19 INFO epoch 96 evaluating [time: 0.60s, valid_score: 0.375200]
05 Aug 02:19 INFO valid result:
recall@10 : 0.2162 mrr@10 : 0.3752 ndcg@10 : 0.2284 hit@10 : 0.7508 precision@10 : 0.1602
05 Aug 02:19 INFO Finished training, best eval result in epoch 85
05 Aug 02:19 INFO Loading model structure and parameters from saved/BPR-Aug-05-2021_02-17-51.pth
Evaluate : 100%|██████████████████████| 472/472 [00:00<00:00, 832.85it/s, GPU RAM: 0.01 G/11.91 G]
05 Aug 02:19 INFO best valid : {'recall@10': 0.2195, 'mrr@10': 0.3871, 'ndcg@10': 0.2344, 'hit@10': 0.7582, 'precision@10': 0.1627}
05 Aug 02:19 INFO test result: {'recall@10': 0.2523, 'mrr@10': 0.4855, 'ndcg@10': 0.292, 'hit@10': 0.7953, 'precision@10': 0.1962}
31 changes: 0 additions & 31 deletions docs/source/get_started/introduction.rst

This file was deleted.

Loading