Torchmetrics paper (#669)

* paper * manifest * move to docs * fix brackets * fix example * formatting * add orcid * add owners * ci build paper * Apply suggestions from code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Luca Di Liello <[email protected]> Co-authored-by: Daniel Stancl <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: quancs <[email protected]> Co-authored-by: William Falcon <[email protected]>
Lightning-AI · Dec 16, 2021 · 4f8015f · 4f8015f
1 parent ad90ba2
commit 4f8015f
Show file tree

Hide file tree

Showing 5 changed files with 217 additions and 4 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -28,3 +28,4 @@
 /.github/*.md                   @edenlightning @SkafteNicki @borda
 /.github/ISSUE_TEMPLATE/*.md    @edenlightning @borda @SkafteNicki
 /docs/source/conf.py            @borda @awaelchli @ethanwharris
+/docs/paper_JOSS/               @SkafteNicki @borda @justusschock @williamFalcon
diff --git a/.github/workflows/docs-check.yml b/.github/workflows/docs-check.yml
@@ -34,10 +34,10 @@ jobs:
       - name: Test Documentation
         env:
           SPHINX_MOCK_REQUIREMENTS: 0
+        working-directory: ./docs
         run: |
           # First run the same pipeline as Read-The-Docs
           apt-get update && sudo apt-get install -y cmake
-          cd docs
           make doctest
           make coverage
 
@@ -72,9 +72,9 @@ jobs:
         shell: bash
 
       - name: Make Documentation
+        # First run the same pipeline as Read-The-Docs
+        working-directory: ./docs
         run: |
-          # First run the same pipeline as Read-The-Docs
-          cd docs
           make clean
           make html --debug --jobs 2 SPHINXOPTS="-W --keep-going" -b linkcheck
 
@@ -84,3 +84,22 @@ jobs:
           name: docs-build
           path: docs/build/
         if: always()
+
+  paper-JOSS:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v2
+      - name: Build draft PDF
+        uses: openjournals/openjournals-draft-action@master
+        with:
+          journal: joss
+          # This should be the path to the paper within your repo.
+          paper-path: ./docs/paper_JOSS/paper.md
+      - name: Upload
+        uses: actions/upload-artifact@v1
+        with:
+          name: JOSS paper
+          # This is the output path where Pandoc will write the compiled
+          # PDF. Note, this should be the same directory as the input paper.md
+          path: ./docs/paper_JOSS/paper.pdf
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -71,7 +71,11 @@ repos:
           - mdformat-gfm
           - mdformat-black
           - mdformat_frontmatter
-        exclude: CHANGELOG.md
+        exclude: |
+            (?x)^(
+                CHANGELOG.md|
+                docs/paper_JOSS/paper.md
+            )$
 
   - repo: https://github.com/asottile/yesqa
     rev: v1.2.3

diff --git a/docs/paper_JOSS/paper.bib b/docs/paper_JOSS/paper.bib
@@ -0,0 +1,85 @@
+@misc{reproducibility,
+    title={Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data Mining},
+    author={Josh Gardner and Yuming Yang and Ryan Baker and Christopher Brooks},
+    year={2018},
+    eprint={1806.05208},
+    archivePrefix={arXiv},
+    primaryClass={cs.CY}
+}
+@inproceedings{fid,
+    title={GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium},
+    volume = {30},
+    url = {https://proceedings.neurips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html},
+    booktitle = {Advances in Neural Information Processing Systems},
+    publisher = {Curran Associates, Inc.},
+    author = {Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp},
+    pages={6629--6640},
+    year = {2017},
+}
+@misc{papers_with_code,
+    title = {Papers With Code},
+    howpublished = {\url{https://paperswithcode.com/}},
+    note = {Accessed: 2021-12-01}
+}
+@misc{arxiv,
+    title = {Arxiv},
+    howpublished = {\url{https://arxiv.org/}},
+    note = {Accessed: 2021-12-16}
+}
+@incollection{pytorch,
+    title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
+    author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
+    booktitle = {Advances in Neural Information Processing Systems 32},
+    editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
+    pages = {8024--8035},
+    year = {2019},
+    publisher = {Curran Associates, Inc.},
+    url={http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
+}
+
+@inproceedings{transformers,
+    title = "Transformers: State-of-the-Art Natural Language Processing",
+    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
+    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
+    month = oct,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
+    pages = "38--45"
+}
+
+@article{scikit_learn,
+    title={Scikit-learn: Machine Learning in {P}ython},
+    author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+    journal={Journal of Machine Learning Research},
+    volume={12},
+    pages={2825--2830},
+    year={2011}
+}
+
+@misc{keras,
+    title={Keras},
+    author={Chollet, Fran\c{c}ois and others},
+    year={2015},
+    publisher={GitHub},
+    howpublished={\url{https://github.com/fchollet/keras}},
+}
+
+@misc{large_example1,
+    title={Scaling Vision Transformers},
+    author={Xiaohua Zhai and Alexander Kolesnikov and Neil Houlsby and Lucas Beyer},
+    year={2021},
+    eprint={2106.04560},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
+}
+
+@misc{large_example2,
+    title={RoBERTa: A Robustly Optimized BERT Pretraining Approach},
+    author={Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov},
+    year={2019},
+    eprint={1907.11692},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
diff --git a/docs/paper_JOSS/paper.md b/docs/paper_JOSS/paper.md
@@ -0,0 +1,104 @@
+---
+title: TorchMetrics - Measuring Reproducibility in PyTorch
+tags:
+  - python
+  - deep learning
+  - pytorch
+authors:
+  - name: Nicki Skafte Detlefsen
+    affiliation: '1,2'
+    orcid: 0000-0002-8133-682X
+  - name: Jiri Borovec
+    affiliation: '1'
+    orcid: 0000-0001-7437-824X
+  - name: Justus Schock
+    affiliation: '1,3'
+    orcid: 0000-0003-0512-3053
+  - name: Ananya Harsh Jha
+    affiliation: '1'
+  - name: Teddy Koker
+    affiliation: '1'
+  - name: Luca Di Liello
+    affiliation: '4'
+  - name: Daniel Stancl
+    affiliation: '5'
+  - name: Changsheng Quan
+    affiliation: '6'
+  - name: William Falcon
+    affiliation: '1,7'
+affiliations:
+  - name: Grid AI Labs
+    index: 1
+  - name: Technical University of Denmark
+    index: 2
+  - name: University Hospital Düsseldorf
+    index: 3
+  - name: University of Trento
+    index: 4
+  - name: Charles University
+    index: 5
+  - name: Zhejiang University
+    index: 6
+  - name: New York University
+    index: 7
+date: 08 Dec 2021
+bibliography: paper.bib
+---
+
+# Summary
+
+A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation [@fid] will differ based on the specific interpolation method used.
+
+There have been a few attempts at tackling the reproducibility issues. Papers With Code [@papers_with_code] links research code with its corresponding paper. Similarly, arXiv [@arxiv] recently added an code and data section that links to both official and community code to papers. However, these methods rely on the paper code made publicly accessible which is not always possible. Our approach is to provide the de-facto reference implementation for metrics. This approach enables non opensource work to still be comparable as long as they’ve used our reference implementations.
+
+We introduce TorchMetrics, a general-purpose metrics package that covers a wide variety of tasks and domains used in the machine learning community. TorchMetrics provides standard classification and regression metrics; and domain-specific metrics for audio, computer vision, natural language processing, and information retrieval. Our process for adding a new metric is as follows, first we integrate a well-tested and established third-party library. Once we’ve verified the implementations and written tests for them, we re-implement them in native PyTorch to enable hardware acceleration and remove any bottlenecks in inter-device transfer.
+
+# Statement of need
+
+Currently, there is no standard, widely-adopted metrics library for native PyTorch. Some native PyTorch libraries support domain-specific metrics such as Transformers [@transformers] for calculating NLP-specific metrics. However, no library exists that covers multiple domains. PyTorch users, therefore, often rely on non-PyTorch packages such as Scikit-learn [@scikit_learn] for computing even simple metrics such as accuracy, F1, or AUROC metrics.
+
+However, while Scikit-learn is considered the gold standard for computing metrics in regression and classification, it relies on the core assumption that all predictions and targets are available simultaneously. This contradicts the typical workflow in a modern deep learning training/evaluation loop where data comes in batches. Therefore, the metric needs to be calculated in an online fashion. It is important to note that, in general, it is not possible to calculate a global metric as its average or sum of the metric calculated per batch.
+
+TorchMetrics solves this problem by introducing stateful metrics that can calculate metric values on a stream of data alongside the classical functional and stateless metrics provided by other packages like Scikit-learn. We do this with an effortless `update` and `compute` interface, well known from packages such as Keras [@keras]. The `update` function takes in a batch of predictions and targets and updates the internal state. For example, for a metric such as accuracy, the internal states are simply the number of correctly classified samples and the total observed number of samples. When all batches have been passed to the `update` method, the `compute` method can get the accumulated accuracy over all the batches. In addition to `update` and `compute`, each metric also has a `forward` method (as any other `torch.nn.Module`) that can be used to both get the metric on the current batch of data and accumulate global state. This enables the user to get fine-grained info about the metric on the individual batch and the global metric of how well their model is doing.
+
+```python
+# Minimal example showcasing the TorchMetrics interface
+import torch
+from torch import tensor, Tensor
+# base class all modular metrics inherit from
+from torchmetrics import Metric
+
+class Accuracy(Metric):
+    def __init__(self):
+        super().__init__()
+        # `self.add_state` defines the states of the metric
+        #  that should be accumulated and will automatically
+        #  be synchronized between devices
+        self.add_state("correct", default=tensor(0), dist_reduce_fx="sum")
+        self.add_state("total", default=tensor(0), dist_reduce_fx="sum")
+
+    def update(self, preds: Tensor, target: Tensor) -> None:
+        # update takes `preds` and `target` and accumulate the current
+        # stream of data into the global states for later
+        self.correct += torch.sum(preds == target)
+        self.total += target.numel()
+
+    def compute(self) -> Tensor:
+        # compute takes the accumulated states
+        # and returns the final metric value
+        return self.correct / self.total
+```
+
+Another core feature of TorchMetrics is its ability to scale to multiple devices seamlessly. Modern deep learning models are often trained on hundreds of devices GPUs or TPUs (see [@large_example1; @large_example2] for examples). This scale introduces the need to synchronize metrics across machines to get the correct value during training and evaluation. In distributed environments, TorchMetrics automatically accumulates across devices before reporting the calculated metric to the user.
+
+In addition to stateful metrics (called modular metrics in TorchMetrics), we also support a functional interface that works similar to Scikit-learn. They are simple python functions that take as input PyTorch Tensors and return the corresponding metric as a PyTorch Tensor. These can be used when metrics are evaluated on single devices, and no accumulation is needed, making them very fast to compute.
+
+TorchMetrics exhibits high test coverage on the various configurations, including all three major OS platforms (Linux, macOS, and Windows), and various Python, CUDA, and PyTorch versions. We test both minimum and latest package requirements for all combinations of OS and Python versions and include additional tests for each PyTorch version from 1.3 up to future development versions. On every pull request and merge to master, we run a full test suite. All standard tests run on CPU. In addition, we run all tests on a multi-GPU setting which reflects realistic Deep Learning workloads. For usability, we have auto-generated HTML documentation (hosted at [readthedocs](https://torchmetrics.readthedocs.io/en/stable/)) from the source code which updates in real-time with new merged pull requests.
+
+TorchMetrics is released under the Apache 2.0 license. The source code is available at https://github.com/PytorchLightning/metrics.
+
+# Acknowledgement
+
+The TorchMetrics team thanks Thomas Chaton, Ethan Harris, Carlos Mocholí, Sean Narenthiran, Adrian Wälchli, Maxim Grechkin, and Ananth Subramaniam for contributing ideas, participating in discussions on API design, and completing Pull Request reviews. We also thank all of our open-source contributors for reporting and resolving issues with this package. We are grateful to the PyTorch Lightning team for their ongoing and dedicated support of this project, and Grid.ai for providing computing resources and cloud credits needed to run our Continuos Integrations.
+
+# References