Skip to content

Latest commit

 

History

History
58 lines (58 loc) · 2.33 KB

2024-09-12-kampen24a.md

File metadata and controls

58 lines (58 loc) · 2.33 KB
title abstract openreview software section layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Towards Scalable Bayesian Transformers: Investigating stochastic subset selection for NLP
Bayesian deep learning provides a framework for quantifying uncertainty. However, the scale of modern neural networks applied in Natural Language Processing (NLP) limits the usability of Bayesian methods. Subnetwork inference aims to approximate the posterior by selecting a stochastic parameter subset for inference, thereby allowing scalable posterior approximations. Determining the optimal parameter space for subnetwork inference is far from trivial. In this paper, we study partially stochastic Bayesian neural networks in the context of transformer models for NLP tasks for the Laplace approximation (LA) and Stochastic weight averaging - Gaussian (SWAG). We propose heuristics for selecting which layers to include in the stochastic subset. We show that norm-based selection is promising for small subsets, and random selection is superior for larger subsets. Moreover, we propose Sparse-KFAC (S-KFAC), an extension of KFAC LA, which selects dense stochastic substructures of linear layers based on parameter magnitudes. S-KFAC retains performance while requiring substantially fewer stochastic parameters and, therefore, drastically limits memory footprint.
ba3McobvmG
Papers
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
kampen24a
0
Towards Scalable Bayesian Transformers: Investigating stochastic subset selection for NLP
1842
1862
1842-1862
1842
false
Kampen, Peter Johannes Tejlgaard and Als, Gustav Ragnar Stoettrup and Andersen, Michael Riis
given family
Peter Johannes Tejlgaard
Kampen
given family
Gustav Ragnar Stoettrup
Als
given family
Michael Riis
Andersen
2024-09-12
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
244
inproceedings
date-parts
2024
9
12