title

abstract

openreview

software

section

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Towards Scalable Bayesian Transformers: Investigating stochastic subset selection for NLP

Bayesian deep learning provides a framework for quantifying uncertainty. However, the scale of modern neural networks applied in Natural Language Processing (NLP) limits the usability of Bayesian methods. Subnetwork inference aims to approximate the posterior by selecting a stochastic parameter subset for inference, thereby allowing scalable posterior approximations. Determining the optimal parameter space for subnetwork inference is far from trivial. In this paper, we study partially stochastic Bayesian neural networks in the context of transformer models for NLP tasks for the Laplace approximation (LA) and Stochastic weight averaging - Gaussian (SWAG). We propose heuristics for selecting which layers to include in the stochastic subset. We show that norm-based selection is promising for small subsets, and random selection is superior for larger subsets. Moreover, we propose Sparse-KFAC (S-KFAC), an extension of KFAC LA, which selects dense stochastic substructures of linear layers based on parameter magnitudes. S-KFAC retains performance while requiring substantially fewer stochastic parameters and, therefore, drastically limits memory footprint.

ba3McobvmG

https://github.com/GustavAls/PartialNLP

Papers

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

kampen24a

0

Towards Scalable Bayesian Transformers: Investigating stochastic subset selection for NLP

1842

1862

1842-1862

1842

false

Kampen, Peter Johannes Tejlgaard and Als, Gustav Ragnar Stoettrup and Andersen, Michael Riis

given	family
Peter Johannes Tejlgaard	Kampen

given	family
Gustav Ragnar Stoettrup	Als

given	family
Michael Riis	Andersen

2024-09-12

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence

244

inproceedings

date-parts

2024

9

12

https://raw.githubusercontent.com/mlresearch/v244/main/assets/kampen24a/kampen24a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-09-12-kampen24a.md

2024-09-12-kampen24a.md

Files

2024-09-12-kampen24a.md

Latest commit

History

2024-09-12-kampen24a.md

File metadata and controls