Skip to content

Commit

Permalink
Merge pull request #1 from RUCAIBox/0.2.x
Browse files Browse the repository at this point in the history
0.2.x
  • Loading branch information
Guan-JW authored Feb 27, 2021
2 parents 0a0f36c + 27c41f6 commit c1337a1
Show file tree
Hide file tree
Showing 280 changed files with 9,452 additions and 15 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ jobs:
pip install pytest
pip install dgl
pip install xgboost
pip install community
pip install networkx
pip install python-louvain
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
# Use "python -m pytest" instead of "pytest" to fix imports
Expand All @@ -44,4 +47,4 @@ jobs:
python -m pytest -v tests/config/test_config.py
export PYTHONPATH=.
python tests/config/test_command_line.py --use_gpu=False --valid_metric=Recall@10 --split_ratio=[0.7,0.2,0.1] --metrics=['Recall@10'] --epochs=200 --eval_setting='LO_RS' --learning_rate=0.3
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@
saved/
*.lprof
*.egg-info/
docs/build/
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/source/asset/afm.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/autoint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/bert4rec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/bpr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/caser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/cdae.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/cke.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/convncf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/data_flow_en.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dcn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/deepfm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dgcf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/din.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dmf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/dssm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/enmf.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/evaluation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fdsa.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/ffm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fnn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fossil.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fpmc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/fwfm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/gcmc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/asset/gcsan.png
Binary file added docs/source/asset/gru4rec.png
Binary file added docs/source/asset/gru4recf.png
Binary file added docs/source/asset/hgn.jpg
Binary file added docs/source/asset/hrm.jpg
Binary file added docs/source/asset/kgat.png
Binary file added docs/source/asset/kgcn.png
Binary file added docs/source/asset/kgnnls.png
Binary file added docs/source/asset/ksr.jpg
Binary file added docs/source/asset/ktup.png
Binary file added docs/source/asset/lightgcn.png
Binary file added docs/source/asset/line.png
Binary file added docs/source/asset/lr.png
Binary file added docs/source/asset/macridvae.png
Binary file added docs/source/asset/mkr.png
Binary file added docs/source/asset/multidae.png
Binary file added docs/source/asset/multivae.png
Binary file added docs/source/asset/nais.png
Binary file added docs/source/asset/narm.png
Binary file added docs/source/asset/neumf.png
Binary file added docs/source/asset/nextitnet.png
Binary file added docs/source/asset/nfm.jpg
Binary file added docs/source/asset/ngcf.jpg
Binary file added docs/source/asset/nncf.png
Binary file added docs/source/asset/npe.jpg
Binary file added docs/source/asset/pnn.jpg
Binary file added docs/source/asset/repeatnet.jpg
Binary file added docs/source/asset/ripplenet.jpg
Binary file added docs/source/asset/s3rec.png
Binary file added docs/source/asset/sasrec.png
Binary file added docs/source/asset/shan.jpg
Binary file added docs/source/asset/spectralcf.png
Binary file added docs/source/asset/srgnn.png
Binary file added docs/source/asset/stamp.png
Binary file added docs/source/asset/transrec.png
Binary file added docs/source/asset/widedeep.png
Binary file added docs/source/asset/xdeepfm.png
74 changes: 74 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import sphinx_rtd_theme
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))


# -- Project information -----------------------------------------------------

project = 'RecBole'
copyright = '2020, RecBole Contributors'
author = 'AIBox RecBole group'

# The full version, including alpha/beta/rc tags
release = '0.2.0'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx_copybutton',
]

autodoc_mock_imports = ["pandas", "pyecharts"]
# autoclass_content = 'both'

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'alabaster'


html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
201 changes: 201 additions & 0 deletions docs/source/developer_guide/customize_dataloaders.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
Customize DataLoaders
======================
Here, we present how to develop a new DataLoader, and apply it into our tool. If we have a new model,
and there is no special requirement for loading the data, then we need to design a new DataLoader.


Abstract DataLoader
--------------------------
In this project, there are three abstracts: :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`, :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin`.

In general, the new dataloader should inherit from the above three abstract classes.
If one only needs to modify existing DataLoader, you can also inherit from the it.
The documentation of dataloader: :doc:`../../recbole/recbole.data.dataloader`


AbstractDataLoader
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader` is the most basic abstract class,
which includes three functions: :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end`,
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle`
and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr_end` is the max
:attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.pr` plus 1.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._shuffle` is leverage to permute the dataset,
which will be invoked by :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__iter__`
if the parameter :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` is True.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data` is used to
load the next batch data, and return the :class:`~recbole.data.interaction.Interaction` format,
which will be invoked in :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__next__`.

In :class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`,
there are two functions to assist the conversion of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._next_batch_data`,
one is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dataframe_to_interaction`,
and the other is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader._dict_to_interaction`.
They both use the functions with the same name in :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.dataset`.
The :class:`pandas.DataFrame` or :class:`dict` is converted into :class:`~recbole.data.interaction.Interaction`.

In addition to the above three functions, two other functions can also be rewrite,
that is :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup`
and :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess`.

:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` is used to tackle the problems except initializing the parameters.
For example, reset the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.batch_size`,
examine the :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.shuffle` setting.
All these things can be rewritten in the subclass.
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is used to process the data,
e.g., negative sampling.

At the end of :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.__init__`,
:meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.setup` will be invoked,
and then if :attr:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.real_time` is ``True``,
then :meth:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader.data_preprocess` is recalled.

NegSampleMixin
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin` inherent from
:class:`~recbole.data.dataloader.abstract_dataloader.AbstractDataLoader`, which is used for negative sampling.
It has three additional functions upon its father class:
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list`.

Since the positive and negative samples should be framed in the same batch,
the original batch size can be not appropriate.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation` is used to reset the batch size,
such that the positive and negative samples can be in the same batch.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._neg_sampling` is used for negative sampling,
which should be implemented by the subclass.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.get_pos_len_list` returns the positive sample number for each user.

In addition, :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` are also changed.
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.setup` will
call :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin._batch_size_adaptation`,
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin.data_preprocess` is used for negative sampling
which should be implemented in the subclass.

NegSampleByMixin
^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin` inherent
from :class:`~recbole.data.dataloader.neg_sample_mixin.NegSampleMixin`,
which is used for negative sampling by ratio.
It supports two strategies, the first one is ``pair-wise sampling``, the other is ``point-wise sampling``.
Then based on the parent class, two functions are added:
:meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_pair_wise_sampling`
and :meth:`~recbole.data.dataloader.neg_sample_mixin.NegSampleByMixin._neg_sample_by_point_wise_sampling`.


Example
--------------------------
Here, we take :class:`~recbole.data.dataloader.user_dataloader.UserDataLoader` as the example,
this dataloader returns user id, which is leveraged to train the user representations.


Implement __init__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:meth:`__init__` can be used to initialize some of the necessary parameters.
Here, we just need to record :attr:`uid_field`.

.. code:: python
def __init__(self, config, dataset,
batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
self.uid_field = dataset.uid_field
super().__init__(config=config, dataset=dataset,
batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
Implement setup()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because of some training requirement, :attr:`self.shuffle` should be true.
Then we can check and revise :attr:`self.shuffle` in :meth:`~recbole.data.dataloader.user_dataloader.setup`.


.. code:: python
def setup(self):
if self.shuffle is False:
self.shuffle = True
self.logger.warning('UserDataLoader must shuffle the data')
Implement pr_end() and _shuffle()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since this dataloader only returns user id, these function can be implemented readily.

.. code:: python
@property
def pr_end(self):
return len(self.dataset.user_feat)
def _shuffle(self):
self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)
Implement _next_batch_data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This function only require return user id from :attr:`user_feat`,
we only have to select one column, and use :meth:`_dataframe_to_interaction` to convert
:class:`pandas.DataFrame` into :class:`~recbole.data.interaction.Interaction`.


.. code:: python
def _next_batch_data(self):
cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
self.pr += self.step
return self._dataframe_to_interaction(cur_data)
Complete Code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python
class UserDataLoader(AbstractDataLoader):
""":class:`UserDataLoader` will return a batch of data which only contains user-id when it is iterated.
Args:
config (Config): The config of dataloader.
dataset (Dataset): The dataset of dataloader.
batch_size (int, optional): The batch_size of dataloader. Defaults to ``1``.
dl_format (InputType, optional): The input type of dataloader. Defaults to
:obj:`~recbole.utils.enum_type.InputType.POINTWISE`.
shuffle (bool, optional): Whether the dataloader will be shuffle after a round. Defaults to ``False``.
Attributes:
shuffle (bool): Whether the dataloader will be shuffle after a round.
However, in :class:`UserDataLoader`, it's guaranteed to be ``True``.
"""
dl_type = DataLoaderType.ORIGIN
def __init__(self, config, dataset,
batch_size=1, dl_format=InputType.POINTWISE, shuffle=False):
self.uid_field = dataset.uid_field
super().__init__(config=config, dataset=dataset,
batch_size=batch_size, dl_format=dl_format, shuffle=shuffle)
def setup(self):
"""Make sure that the :attr:`shuffle` is True. If :attr:`shuffle` is False, it will be changed to True
and give a warning to user.
"""
if self.shuffle is False:
self.shuffle = True
self.logger.warning('UserDataLoader must shuffle the data')
@property
def pr_end(self):
return len(self.dataset.user_feat)
def _shuffle(self):
self.dataset.user_feat = self.dataset.user_feat.sample(frac=1).reset_index(drop=True)
def _next_batch_data(self):
cur_data = self.dataset.user_feat[[self.uid_field]][self.pr: self.pr + self.step]
self.pr += self.step
return self._dataframe_to_interaction(cur_data)
Other more complex Dataloader development can refer to the source code.
Loading

0 comments on commit c1337a1

Please sign in to comment.