Skip to content

Commit

Permalink
Torchmetrics paper (#669)
Browse files Browse the repository at this point in the history
* paper
* manifest
* move to docs
* fix brackets
* fix example
* formatting
* add orcid
* add owners
* ci build paper
* Apply suggestions from code review

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Di Liello <[email protected]>
Co-authored-by: Daniel Stancl <[email protected]>
Co-authored-by: Justus Schock <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: quancs <[email protected]>
Co-authored-by: William Falcon <[email protected]>
  • Loading branch information
10 people authored Dec 16, 2021
1 parent ad90ba2 commit 4f8015f
Show file tree
Hide file tree
Showing 5 changed files with 217 additions and 4 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@
/.github/*.md @edenlightning @SkafteNicki @borda
/.github/ISSUE_TEMPLATE/*.md @edenlightning @borda @SkafteNicki
/docs/source/conf.py @borda @awaelchli @ethanwharris
/docs/paper_JOSS/ @SkafteNicki @borda @justusschock @williamFalcon
25 changes: 22 additions & 3 deletions .github/workflows/docs-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ jobs:
- name: Test Documentation
env:
SPHINX_MOCK_REQUIREMENTS: 0
working-directory: ./docs
run: |
# First run the same pipeline as Read-The-Docs
apt-get update && sudo apt-get install -y cmake
cd docs
make doctest
make coverage
Expand Down Expand Up @@ -72,9 +72,9 @@ jobs:
shell: bash

- name: Make Documentation
# First run the same pipeline as Read-The-Docs
working-directory: ./docs
run: |
# First run the same pipeline as Read-The-Docs
cd docs
make clean
make html --debug --jobs 2 SPHINXOPTS="-W --keep-going" -b linkcheck
Expand All @@ -84,3 +84,22 @@ jobs:
name: docs-build
path: docs/build/
if: always()

paper-JOSS:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
# This should be the path to the paper within your repo.
paper-path: ./docs/paper_JOSS/paper.md
- name: Upload
uses: actions/upload-artifact@v1
with:
name: JOSS paper
# This is the output path where Pandoc will write the compiled
# PDF. Note, this should be the same directory as the input paper.md
path: ./docs/paper_JOSS/paper.pdf
6 changes: 5 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,11 @@ repos:
- mdformat-gfm
- mdformat-black
- mdformat_frontmatter
exclude: CHANGELOG.md
exclude: |
(?x)^(
CHANGELOG.md|
docs/paper_JOSS/paper.md
)$
- repo: https://github.com/asottile/yesqa
rev: v1.2.3
Expand Down
85 changes: 85 additions & 0 deletions docs/paper_JOSS/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
@misc{reproducibility,
title={Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining},
author={Josh Gardner and Yuming Yang and Ryan Baker and Christopher Brooks},
year={2018},
eprint={1806.05208},
archivePrefix={arXiv},
primaryClass={cs.CY}
}
@inproceedings{fid,
title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
volume = {30},
url = {https://proceedings.neurips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html},
booktitle = {Advances in Neural Information Processing Systems},
publisher = {Curran Associates, Inc.},
author = {Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
pages={6629--6640},
year = {2017},
}
@misc{papers_with_code,
title = {Papers With Code},
howpublished = {\url{https://paperswithcode.com/}},
note = {Accessed: 2021-12-01}
}
@misc{arxiv,
title = {Arxiv},
howpublished = {\url{https://arxiv.org/}},
note = {Accessed: 2021-12-16}
}
@incollection{pytorch,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url={http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

@inproceedings{transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}

@article{scikit_learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}

@misc{keras,
title={Keras},
author={Chollet, Fran\c{c}ois and others},
year={2015},
publisher={GitHub},
howpublished={\url{https://github.com/fchollet/keras}},
}

@misc{large_example1,
title={Scaling Vision Transformers},
author={Xiaohua Zhai and Alexander Kolesnikov and Neil Houlsby and Lucas Beyer},
year={2021},
eprint={2106.04560},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

@misc{large_example2,
title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
author={Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov},
year={2019},
eprint={1907.11692},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
104 changes: 104 additions & 0 deletions docs/paper_JOSS/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: TorchMetrics - Measuring Reproducibility in PyTorch
tags:
- python
- deep learning
- pytorch
authors:
- name: Nicki Skafte Detlefsen
affiliation: '1,2'
orcid: 0000-0002-8133-682X
- name: Jiri Borovec
affiliation: '1'
orcid: 0000-0001-7437-824X
- name: Justus Schock
affiliation: '1,3'
orcid: 0000-0003-0512-3053
- name: Ananya Harsh Jha
affiliation: '1'
- name: Teddy Koker
affiliation: '1'
- name: Luca Di Liello
affiliation: '4'
- name: Daniel Stancl
affiliation: '5'
- name: Changsheng Quan
affiliation: '6'
- name: William Falcon
affiliation: '1,7'
affiliations:
- name: Grid AI Labs
index: 1
- name: Technical University of Denmark
index: 2
- name: University Hospital Düsseldorf
index: 3
- name: University of Trento
index: 4
- name: Charles University
index: 5
- name: Zhejiang University
index: 6
- name: New York University
index: 7
date: 08 Dec 2021
bibliography: paper.bib
---

# Summary

A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation [@fid] will differ based on the specific interpolation method used.

There have been a few attempts at tackling the reproducibility issues. Papers With Code [@papers_with_code] links research code with its corresponding paper. Similarly, arXiv [@arxiv] recently added an code and data section that links to both official and community code to papers. However, these methods rely on the paper code made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables non opensource work to still be comparable as long as they’ve used our reference implementations.

We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we’ve verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.

# Statement of need

Currently, there is no standard, widely-adopted metrics library for native PyTorch. Some native PyTorch libraries support domain-specific metrics such as Transformers [@transformers] for calculating NLP-specific metrics. However, no library exists that covers multiple domains. PyTorch users, therefore, often rely on non-PyTorch packages such as Scikit-learn [@scikit_learn] for computing even simple metrics such as accuracy, F1, or AUROC metrics.

However, while Scikit-learn is considered the gold standard for computing metrics in regression and classification, it relies on the core assumption that all predictions and targets are available simultaneously. This contradicts the typical workflow in a modern deep learning training/evaluation loop where data comes in batches. Therefore, the metric needs to be calculated in an online fashion. It is important to note that, in general, it is not possible to calculate a global metric as its average or sum of the metric calculated per batch.

TorchMetrics solves this problem by introducing stateful metrics that can calculate metric values on a stream of data alongside the classical functional and stateless metrics provided by other packages like Scikit-learn. We do this with an effortless `update` and `compute` interface, well known from packages such as Keras [@keras]. The `update` function takes in a batch of predictions and targets and updates the internal state. For example, for a metric such as accuracy, the internal states are simply the number of correctly classified samples and the total observed number of samples. When all batches have been passed to the `update` method, the `compute` method can get the accumulated accuracy over all the batches. In addition to `update` and `compute`, each metric also has a `forward` method (as any other `torch.nn.Module`) that can be used to both get the metric on the current batch of data and accumulate global state. This enables the user to get fine-grained info about the metric on the individual batch and the global metric of how well their model is doing.

```python
# Minimal example showcasing the TorchMetrics interface
import torch
from torch import tensor, Tensor
# base class all modular metrics inherit from
from torchmetrics import Metric

class Accuracy(Metric):
def __init__(self):
super().__init__()
# `self.add_state` defines the states of the metric
# that should be accumulated and will automatically
# be synchronized between devices
self.add_state("correct", default=tensor(0), dist_reduce_fx="sum")
self.add_state("total", default=tensor(0), dist_reduce_fx="sum")

def update(self, preds: Tensor, target: Tensor) -> None:
# update takes `preds` and `target` and accumulate the current
# stream of data into the global states for later
self.correct += torch.sum(preds == target)
self.total += target.numel()

def compute(self) -> Tensor:
# compute takes the accumulated states
# and returns the final metric value
return self.correct / self.total
```

Another core feature of TorchMetrics is its ability to scale to multiple devices seamlessly. Modern deep learning models are often trained on hundreds of devices GPUs or TPUs (see [@large_example1; @large_example2] for examples). This scale introduces the need to synchronize metrics across machines to get the correct value during training and evaluation. In distributed environments, TorchMetrics automatically accumulates across devices before reporting the calculated metric to the user.

In addition to stateful metrics (called modular metrics in TorchMetrics), we also support a functional interface that works similar to Scikit-learn. They are simple python functions that take as input PyTorch Tensors and return the corresponding metric as a PyTorch Tensor. These can be used when metrics are evaluated on single devices, and no accumulation is needed, making them very fast to compute.

TorchMetrics exhibits high test coverage on the various configurations, including all three major OS platforms (Linux, macOS, and Windows), and various Python, CUDA, and PyTorch versions. We test both minimum and latest package requirements for all combinations of OS and Python versions and include additional tests for each PyTorch version from 1.3 up to future development versions. On every pull request and merge to master, we run a full test suite. All standard tests run on CPU. In addition, we run all tests on a multi-GPU setting which reflects realistic Deep Learning workloads. For usability, we have auto-generated HTML documentation (hosted at [readthedocs](https://torchmetrics.readthedocs.io/en/stable/)) from the source code which updates in real-time with new merged pull requests.

TorchMetrics is released under the Apache 2.0 license. The source code is available at https://github.com/PytorchLightning/metrics.

# Acknowledgement

The TorchMetrics team thanks Thomas Chaton, Ethan Harris, Carlos Mocholí, Sean Narenthiran, Adrian Wälchli, Maxim Grechkin, and Ananth Subramaniam for contributing ideas, participating in discussions on API design, and completing Pull Request reviews. We also thank all of our open-source contributors for reporting and resolving issues with this package. We are grateful to the PyTorch Lightning team for their ongoing and dedicated support of this project, and Grid.ai for providing computing resources and cloud credits needed to run our Continuos Integrations.

# References

0 comments on commit 4f8015f

Please sign in to comment.