Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/notebook tests #108

Merged
merged 23 commits into from
Jul 20, 2022
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .github/workflows/notebook_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: notebook tests

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
build-and-test:
name: Python ${{ matrix.python-version }} on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ['3.8']
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Build
run: |
set -xe
python -VV
pip install --upgrade pip
pip install -e '.[test]'

- name: Print versions
run: |
python -VV
python -c "import jax; print('jax', jax.__version__)"
python -c "import jaxlib; print('jaxlib', jaxlib.__version__)"

- name: Intall Jupyter kernel
run: |
pip install ipykernel
python -m ipykernel install --user --name=ott

- name: Run notebook tests
timeout-minutes: 60
run: |
python -m pytest -m notebook --kernel-name=ott --notebook-cell-timeout=3600
6 changes: 3 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
run: |
set -xe
python -VV
pip install --upgrade pip setuptools
pip install --upgrade pip
pip install pytest-memray
pip install -e '.[test]'

Expand All @@ -39,12 +39,12 @@ jobs:
- name: Run fast tests
if: ${{ matrix.test_mark == 'fast' }}
run: |
pytest --cov=ott --cov-append --cov-report=xml --cov-report=term-missing --cov-config=setup.cfg --memray -m fast -n auto
python -m pytest --cov=ott --cov-append --cov-report=xml --cov-report=term-missing --cov-config=setup.cfg --memray -m fast -n auto

- name: Run all tests
if: ${{ matrix.test_mark == 'all' }}
run: |
pytest --cov=ott --cov-append --cov-report=xml --cov-report=term-missing --cov-config=setup.cfg --memray
python -m pytest --cov=ott --cov-append --cov-report=xml --cov-report=term-missing --cov-config=setup.cfg --memray

- name: Upload coverage
uses: codecov/codecov-action@v3
Expand Down
10 changes: 7 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ repos:
hooks:
- id: yapf
additional_dependencies: [toml]
- repo: https://github.com/tomcatling/black-nb
rev: '0.7'
hooks:
- id: black-nb
- repo: https://github.com/PyCQA/isort
rev: 5.10.1
hooks:
Expand All @@ -26,12 +30,12 @@ repos:
- flake8-bugbear
- flake8-blind-except
- repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
rev: v2.3.0
rev: v2.4.0
hooks:
- id: pretty-format-yaml
args: [--autofix, --indent, '2']
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.2.0
rev: v4.3.0
hooks:
- id: detect-private-key
- id: check-ast
Expand Down Expand Up @@ -61,7 +65,7 @@ repos:
- flake8-blind-except
args: [--docstring-convention, google]
- repo: https://github.com/asottile/pyupgrade
rev: v2.32.1
rev: v2.37.1
hooks:
- id: pyupgrade
args: [--py3-plus, --py37-plus, --keep-runtime-typing]
7 changes: 7 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ pytest tests/core/sinkhorn_test.py # only test within a specific file
pytest -k "test_euclidean_point_cloud" # only tests which contain the expression
```

In order to run memory related tests (used for low-rank solvers/geometries and online point clouds), we utilize
[pytest-memray](https://github.com/bloomberg/pytest-memray) (current available only on Linux).
Whenever running the ``pytest`` commands mentioned above, the ``--memray`` option needs to be specified as well.

Lastly, to the run notebook regression tests, use ``pytest -m notebook``. Cell execution limit can be adjusted
using ``--notebook-cell-timeout=...`` (in seconds), Jupyter kernel name can be set using ``--kernel-name=...``.

## Building documentation
From the root of the repository, run:
```bash
Expand Down
19 changes: 15 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

# The full version, including alpha/beta/rc tags
release = ott.__version__
version = release
version = ott.__version__

# -- General configuration ---------------------------------------------------

Expand Down Expand Up @@ -73,8 +73,6 @@
autosummary_generate = True

autodoc_typehints = 'description'
pygments_lexer = 'ipython3'
nbsphinx_execute = 'never'

# bibliography
bibtex_bibfiles = ["references.bib"]
Expand All @@ -93,7 +91,6 @@

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#

html_theme = 'sphinx_book_theme'
html_logo = '_static/logoOTT.png'
Expand All @@ -103,3 +100,17 @@
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

nbsphinx_codecell_lexer = "ipython3"
nbsphinx_execute = 'never'
nbsphinx_prolog = r"""
{% set docname = 'docs/' + env.doc2path(env.docname, base=None) %}
.. raw:: html

<div class="docutils container">
<a class="reference external"
href="https://colab.research.google.com/github/ott-jax/ott/blob/main/{{ docname|e }}">
<img alt="Colab badge" src="https://colab.research.google.com/assets/colab-badge.svg" width="125px">
</a>
</div>
"""
26 changes: 8 additions & 18 deletions docs/core.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _core:

ott.core package
================
.. currentmodule:: ott.core
Expand All @@ -6,12 +8,12 @@ ott.core package
The core package contains definitions of various OT problems, starting
from the most simple, the linear OT problem, to more advanced problems
such as quadratic, or involving multiple measures, the barycenter problem.
We follow with the classic :class:`ott.core.sinkhorn.sinkhorn` routine (essentially a
wrapper for the :class:`ott.core.sinkhorn.Sinkhorn` solver class) [#]_, [#]_. We also provide an analogous
low-rank Sinkhorn solver [#]_ to handle very large instances. Both are used
within our Wasserstein barycenter solvers [#]_, [#]_ as well as our
Gromov-Wasserstein solver [#]_, [#]_. We also provide an implementation of
input convex neural networks [#]_, a NN that can be used to estimate OT [#]_.
We follow with the classic :class:`~ott.core.sinkhorn.sinkhorn` routine (essentially a
wrapper for the :class:`~ott.core.sinkhorn.Sinkhorn` solver class) :cite:`cuturi:13,sejourne:19`.
We also provide an analogous low-rank Sinkhorn solver :cite:`scetbon:21` to handle very large instances.
Both are used within our Wasserstein barycenter solvers :cite:`benamou:15,janati:20a`, as well as our
Gromov-Wasserstein solver :cite:`memoli:11,scetbon:22`. We also provide an implementation of
input convex neural networks :cite:`amos:17`, a NN that can be used to estimate OT :cite:`makkuva:20`.

OT Problems
-----------
Expand Down Expand Up @@ -63,15 +65,3 @@ Neural Potentials
icnn.ICNN
neuraldual.NeuralDualSolver
neuraldual.NeuralDual

References
----------
.. [#] M. Cuturi, `Sinkhorn Distances: Lightspeed Computation of Optimal Transport <https://papers.nips.cc/paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html>`_ , NIPS 2013.
.. [#] T. Séjourné, `Sinkhorn Divergences for Unbalanced Optimal Transport <https://arxiv.org/abs/1910.12958>`_ , NeurIPS 2019.
.. [#] M. Scetbon et al., `Low-Rank Sinkhorn Factorization <http://proceedings.mlr.press/v139/scetbon21a/scetbon21a.pdf>`_ , ICML 2021.
.. [#] J.D. Benamou et al., `Iterative Bregman Projections for Regularized Transportation Problems <https://epubs.siam.org/doi/abs/10.1137/141000439>`_ , SIAM J. Sci. Comput. 37(2), A1111-A1138.
.. [#] H. Janati et al., `Debiased Sinkhorn Barycenters <http://proceedings.mlr.press/v119/janati20a.html>`_ , ICML 2020.
.. [#] F. Memoli, `Gromov–Wasserstein distances and the metric approach to object matching <https://link.springer.com/article/10.1007/s10208-011-9093-5>`_ , FOCM 2011.
.. [#] M. Scetbon et al., `Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs <https://arxiv.org/abs/2106.01128>`_, Arxiv.
.. [#] B. Amos, L. Xu, J. Z. Kolter, `Input Convex Neural Networks <https://proceedings.mlr.press/v70/amos17b/amos17b.pdf>`_, ICML 2017.
.. [#] Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee, `Optimal transport mapping via input convex neural networks <https://arxiv.org/abs/1908.10962>`_ , ICML 2020
22 changes: 9 additions & 13 deletions docs/geometry.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
.. _geometry:

ott.geometry package
====================
.. currentmodule:: ott.geometry
.. automodule:: ott.geometry

This package implements several classes to define a geometry, arguably the most influential
ingredient of optimal transport problem. In its full generality, a :class:`ott.geometry.geometry.Geometry`
ingredient of optimal transport problem. In its full generality, a :class:`~ott.geometry.geometry.Geometry`
defines source points (input measure), target points (target measure) and a ground cost function
(resp. a positive kernel function) that quantifies how expensive (resp. easy) it is to displace
a unit of mass from any of the input points to the target points.
Expand All @@ -13,22 +15,22 @@ The geometry package proposes a few simple geometries. The simplest of all would
be that for which input and target points coincide, and the geometry between them
simplifies to a symmetric cost or kernel matrix. In the very particular case
where these points happen to lie on grid (a cartesian product in full generality,
e.g. 2 or 3D grids), the :class:`ott.geometry.grid.Grid` geometry will prove useful.
e.g. 2 or 3D grids), the :class:`~ott.geometry.grid.Grid` geometry will prove useful.

For more general settings where input/target points do not coincide, one can
alternatively instantiate a :class:`ott.geometry.geometry.Geometry` through a rectangular cost matrix.
alternatively instantiate a :class:`~ott.geometry.geometry.Geometry` through a rectangular cost matrix.

However, it is often preferable in applications to define ground costs "symbolically",
by listing instead points in the input/target point clouds, to specify directly
a cost *function* between them. Such functions should follow the :class:`ott.geometry.costs.CostFn`
a cost *function* between them. Such functions should follow the :class:`~ott.geometry.costs.CostFn`
class description. We provide a few standard cost functions that are meaningful in an
OT context, notably the (unbalanced, regularized) Bures distances between
Gaussians [#]_. That cost can be used for instance to compute a distance between
Gaussian mixtures, as proposed in [#]_ and revisited in [#]_.
Gaussians :cite:`janati:20`. That cost can be used for instance to compute a distance between
Gaussian mixtures, as proposed in :cite:`chen:19a` and revisited in :cite:`delon:20`.

To be useful with Sinkhorn solvers, ``Geometries`` typically need to provide an
``epsilon`` regularization parameter. We propose either to set that value once for
all, or implement an annealing scheduler :class:`ott.geometry.epsilon_scheduler.Epsilon`.
all, or implement an annealing :class:`~ott.geometry.epsilon_scheduler.Epsilon` scheduler.

Geometries
----------
Expand All @@ -51,9 +53,3 @@ Cost Functions
costs.Cosine
costs.Bures
costs.UnbalancedBures

References
----------
.. [#] H. Janati et al., `Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form <https://proceedings.neurips.cc//paper_files/paper/2020/hash/766e428d1e232bbdd58664b41346196c-Abstract.html>`_ , NeurIPS 2020.
.. [#] Y. Chen et al., `Optimal Transport for Gaussian Mixture Models <https://ieeexplore.ieee.org/document/8590715>`_ , IEEE Access (7)
.. [#] J. Delon and A. Desolneux, `A Wasserstein-Type Distance in the Space of Gaussian Mixture Models <https://epubs.siam.org/doi/pdf/10.1137/19M1301047>`_ , SIIMS (13)-2, 936--970
38 changes: 13 additions & 25 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,29 @@ OTT is a `JAX <https://jax.readthedocs.io/en/latest/index.html>`_ package that b
differentiate the solution to optimal transport problems. OTT can help you compute Wasserstein distances between
weighted clouds of points (or histograms), using a cost (e.g. a distance) between individual points.

To that end OTT uses various implementation of the Sinkhorn algorithm [#]_ [#]_ [#]_.
To that end OTT uses various implementation of the Sinkhorn algorithm :cite:`cuturi:13,peyre:19,scetbon:21`.
These implementation take advantage of several JAX features, such as `Just-in-time (JIT) compilation`_,
`auto-vectorization (VMAP)`_ and both `automatic`_ and/or `implicit`_ differentiation.
A few tutorials are provided below, along with different use-cases,
notably for single-cell genomics data [#]_.
notably for single-cell genomics data :cite:`schiebinger:19`.

Packages
--------
There are currently three packages, ``geometry``, ``core`` and ``tools``, playing the following roles:

- `<geometry>`_ defines classes that describe *two point clouds* paired with a *cost* function (simpler geometries
- :ref:`geometry` defines classes that describe *two point clouds* paired with a *cost* function (simpler geometries
are also implemented, such as that defined by points supported on a multi-dimensional grids with a separable
cost [#]_). The design choice in OTT is to state that cost functions and algorithms should operate independently:
if a particular cost function allows for faster computations
cost :cite:`solomon:15`). The design choice in OTT is to state that cost functions and algorithms should operate
independently: if a particular cost function allows for faster computations
(e.g. squared-Euclidean distance when comparing grids), this should not be taken advantage of at the level of
optimizers, but at the level of the problems description. Geometry objects are therefore only considered as
arguments to describe OT problem handled in ``core``, using subroutines provided by geometries;
- `<core>`_ help define first an OT problem (linear, quadratic, barycenters). These problems are then solved using
- :ref:`core` help define first an OT problem (linear, quadratic, barycenters). These problems are then solved using
Sinkhorn algorithm and its variants, the main workhorse to solve OT in this package, as well as variants that
can comppute Gromov-Wasserstein distances or barycenters of several measures;
- `<tools>`_ provides an interface to exploit OT solutions, as produced by ``core`` functions. Such tasks include
instantiating OT matrices, computing approximations to Wasserstein distances [#]_ [#]_,
or computing differentiable sort and quantile operations [#]_.
- :ref:`tools` provides an interface to exploit OT solutions, as produced by ``core`` functions. Such tasks include
instantiating OT matrices, computing approximations to Wasserstein distances :cite:`genevay:18,sejourne:19`,
or computing differentiable sort and quantile operations :cite:`cuturi:19`.

.. toctree::
:maxdepth: 1
Expand Down Expand Up @@ -71,24 +71,12 @@ There are currently three packages, ``geometry``, ``core`` and ``tools``, playin
geometry
core
tools
references

Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. toctree::
:maxdepth: 1
:caption: References:

References
==========
.. [#] M. Cuturi, `Sinkhorn Distances: Lightspeed Computation of Optimal Transport <https://papers.nips.cc/paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html>`_, NIPS'13.
.. [#] G. Peyré, M. Cuturi, `Computational Optimal Transport <https://www.nowpublishers.com/article/Details/MAL-073>`_, FNT in ML, 2019.
.. [#] M. Scetbon et al., `Low-Rank Sinkhorn Factorization <http://proceedings.mlr.press/v139/scetbon21a/scetbon21a.pdf>`_ , ICML 2021.
.. [#] G. Schiebinger et al., `Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming <https://www.cell.com/cell/pdf/S0092-8674(19)30039-X.pdf>`_, Cell 176, 928--943.
.. [#] J. Solomon et al, `Convolutional Wasserstein distances: efficient optimal transportation on geometric domains <https://dl.acm.org/doi/10.1145/2766963>`_, ACM ToG, SIGGRAPH'15.
.. [#] A. Genevay et al., `Learning Generative Models with Sinkhorn Divergences <http://proceedings.mlr.press/v84/genevay18a.html>`_, AISTATS'18.
.. [#] T. Séjourné et al., `Sinkhorn Divergences for Unbalanced Optimal Transport <https://arxiv.org/abs/1910.12958>`_, arXiv:1910.12958.
.. [#] M. Cuturi et al. `Differentiable Ranking and Sorting using Optimal Transport <https://papers.nips.cc/paper/2019/hash/d8c24ca8f23c562a5600876ca2a550ce-Abstract.html>`_, NeurIPS'19.
references

.. _Just-in-time (JIT) compilation: https://jax.readthedocs.io/en/latest/jax.html#just-in-time-compilation-jit
.. _auto-vectorization (VMAP): https://jax.readthedocs.io/en/latest/jax.html#vectorization-vmap
Expand Down
Loading