In memory of Professor Naftali Tishby.
Last updated on October, 2022.
To learn, you must forget. This may probably be one of the most intuitive lessons we have from Naftali Tishby's Information Bottleneck (IB) methods, which grew out of the fundamental tradeoff (rate v.s. distortion) from Claude Shannon's information theory, and later creatively explained the learning behaviors of deep neural networks by the fitting & compression framework.
It has been four years since the dazzling talk on Opening the Black Box of Deep Neural Networks, and more than twenty years since the first paper on the Information Bottleneck method. It is time for us to take a look back, to celebrate what has been established, and to prepare for a future.
This repository is organized as follows:
- Classics
- Reviews
- Theories
- Models
- Applications (General)
- Applications (RL)
- Methods for Mutual Information Estimation (😣 MI is notoriously hard to estimate! )
- Other Information Theory Driven Work (verbose)
- Citation
All papers are selected and sorted by topic/conference/year/importance. Please send a pull request if you would like to add any paper.
We also made slides on theory, applications and controversy for the initial Information Bottleneck principle in deep learning (p.s., some controversy has been addressed by recent publications, e.g., Lorenzen et al., 2021).
Agglomerative Information Bottleneck [link]
Noam Slonim, Naftali Tishby
NIPS, 1999
🐤 The Information Bottleneck Method [link]
Naftali Tishby, Fernando C. Pereira, William Bialek
Preprint, 2000
Predictability, complexity and learning [link]
William Bialek, Ilya Nemenman, Naftali Tishby
Neural Computation, 2001
Sufficient Dimensionality Reduction: A novel analysis principle [link]
Amir Globerson, Naftali Tishby
ICML, 2002
The information bottleneck: Theory and applications [link]
Noam Slonim
PhD Thesis, 2002
An Information Theoretic Tradeoff between Complexity and Accuarcy [link]
Ran Gilad-Bachrach, Amir Navot, Naftali Tishby
COLT, 2003
Information Bottleneck for Gaussian Variables [link]
Gal Chechik, Amir Globerson, Naftali Tishby, Yair Weiss
NIPS, 2003
Information and Fitness [link]
Samuel F. Taylor, Naftali Tishby and William Bialek
Preprint, 2007
Efficient representation as a design principle for neural coding and computation [link]
William Bialek, Rob R. de Ruyter van Steveninck, and Naftali Tishby
Preprint, 2007
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure [link]
Peter Harremoes and Naftali Tishby
ISIT, 2007
🐤 Learning and Generalization with the Information Bottleneck [link]
Ohad Shamir, Sivan Sabato, Naftali Tishby
Journal of Theoretical Computer Science, 2009
🐤 Information-Theoretic Bounded Rationality [link]
Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby
Preprint, 2015
🐤 Opening the Black Box of Deep Neural Networks via Information [link]
Ravid Shwartz-Ziv, Naftali Tishby
ICRI, 2017
Information Bottleneck and its Applications in Deep Learning [link]
Hassan Hafez-Kolahi, Shohreh Kasaei
Preprint, 2019
The Information Bottleneck Problem and Its Applications in Machine Learning [link]
Ziv Goldfeld, Yury Polyanskiy
Preprint, 2020
On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views [link]
Abdellatif Zaidi, Iñaki Estella-Aguerri, Shlomo Shamai
Entropy, 2020
Information Bottleneck: Theory and Applications in Deep Learning [link]
Bernhard C. Geiger, Gernot Kubin
Entropy, 2020
On Information Plane Analyses of Neural Network Classifiers – A Review [link]
Bernhard C. Geiger
Preprint, 2021
Table 1 (p.2) gives a nice summary on the effect of different architectures & MI estimators on the existence of the compression phases and causal links between compression and generalizations.
A Critical Review of Information Bottleneck Theory and its Applications to Deep Learning [link]
Mohammad Ali Alomrani
Preprint, 2021
Information Flow in Deep Neural Networks [link]
Ravid Shwartz-Ziv
PhD Thesis, 2022
Gaussian Lower Bound for the Information Bottleneck Limit [link]
Amichai Painsky, Naftali Tishby
JMLR, 2017
Information-theoretic analysis of generalization capability of learning algorithms [link]
Aolin Xu, Maxim Raginsky
NeurIPS, 2017
Caveats for information bottleneck in deterministic scenarios [link] [ICLR version]
Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk
UAI, 2018
🐤🔥 Emergence of Invariance and Disentanglement in Deep Representations [link]
Alessandro Achille, Stefano Soatto
JMLR, 2018
- This paper is a gem. On a high-level, it shows the relationship of generalization and information bottleneck in weights (IIW).
- Be aware how this differs from Tishby's original definition on information bottleneck in representation).
- Specifically, if we approximate SGD by stochastic differential equations, we can see that SGD naturally leads to minimization in IIW.
- The authors argue that an optimal representation should have 4 properties: sufficiency, minimality, invariance, and disentanglement. Notably, the last two properties can naturally emerge with the minimization in mutual information between the datasets and network weights, or IIW.
On the Information Bottleneck Theory of Deep Learning [link]
Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox
ICLR, 2018
The Dual Information Bottleneck [link]
Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby
Preprint, 2019
🐤 Learnability for the Information Bottleneck [link] [slides] [poster] [journal version] [workshop version]
Tailin Wu, Ian Fischer, Isaac L. Chuang, Max Tegmark
UAI, 2019
🐤 Phase Transitions for the Information Bottleneck in Representation Learning [link] [video]
Tailin Wu, Ian Fischer
ICLR, 2020
Bottleneck Problems: Information and Estimation-Theoretic View [link]
Shahab Asoodeh, Flavio Calmon
Preprint, 2020
Information Bottleneck: Exact Analysis of (Quantized) Neural Networks [link]
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen
Preprint, 2021
- This paper shows that different ways of binning when computing the mutual information leads to qualitatively different results.
- It then confirms then original IB paper's results of the fitting & compression phase using quantized nets with exact computation for mutual information.
Perturbation Theory for the Information Bottleneck [link]
Vudtiwat Ngampruetikorn, David J. Schwab
Preprint, 2021
PAC-Bayes Information Bottleneck [link]
Zifeng Wang, Shao-Lun Huang, Ercan Engin Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng
ICLR, 2022
- This paper discusses using
$I(w, S)$ instead to$I(T, X)$ as the information bottleneck.- However, activations should in effect play a crucial role in network's generalization, but they are not explicitly captured by
$I(w, S)$ .
Deep Variational Information Bottleneck [link]
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
ICLR, 2017
The Deterministic Information Bottleneck [link] [UAI Version]
DJ Strouse, David J. Schwab
Neural Computation, 2017
This replaces the mutual information term with entropy in the original IB objective.
Learning Sparse Latent Representations with the Deep Copula Information Bottleneck [link]
Aleksander Wieczorek, Mario Wieser, Damian Murezzan, Volker Roth
ICLR, 2018
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck [link]
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann
NeurIPS, 2019
Information bottleneck through variational glasses [link]
Slava Voloshynovskiy, Mouad Kondah, Shideh Rezaeifar, Olga Taran, Taras Holotyak, Danilo Jimenez Rezende
NeurIPS Bayesian Deep Learning Workshop, 2019
🐤 Variational Discriminator Bottleneck [link]
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine
ICLR, 2019
Nonlinear Information Bottleneck [link]
Artemy Kolchinsky, Brendan Tracey, David Wolpert
Entropy, 2019
This formuation shows better performance than VIB.
General Information Bottleneck Objectives and their Applications to Machine Learning [link]
Sayandev Mukherjee
Preprint, 2019
This paper synthesize IB and Predictive IB, and provides a new variational bound.
🐤 Graph Information Bottleneck [link] [code] [slides]
Tailin Wu, Hongyu Ren, Pan Li, Jure Leskovec,
NeurIPS, 2020
🐤 Learning Optimal Representations with the Decodable Information Bottleneck [link]
Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam
NeurIPS, 2020
🐤 Concept Bottleneck Models [link]
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
ICML, 2020
Disentangled Representations for Sequence Data using Information Bottleneck Principle [link] [talk]
Masanori Yamada, Heecheol Kim, Kosuke Miyoshi, Tomoharu Iwata, Hiroshi Yamakawa
ICML, 2020
🐤 IBA: Restricting the Flow: Information Bottlenecks for Attribution [link] [code]
Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf
ICLR, 2020
On the Difference between the Information Bottleneck and the Deep Information Bottleneck [link]
Aleksander Wieczorek, Volker Roth
Entropy, 2020
The Convex Information Bottleneck Lagrangian [link]
Borja Rodríguez Gálvez, Ragnar Thobaben, Mikael Skoglund
Preprint, 2020
The HSIC Bottleneck: Deep Learning without Back-Propagation [link] [code]
Wan-Duo Kurt Ma, J.P. Lewis, W. Bastiaan Kleijn
AAAI, 2020
- This paper uses Hilbert-Schmidt independence criterion (HSIC) as a surrogate to compute mutual information in IB objective.
- It shows an alternative way to learn a neural network without backpropagation, inspired by the IB principle.
Disentangled Information Bottleneck [link] [code]
Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang
AAAI, 2021
🐤 IB-GAN: Disentangled Representation Learning [link] [code][talk]
Insu Jeon, Wonkwang Lee, Myeongjang Pyeon, Gunhee Kim
AAAI, 2021
This model add additional IB constraint based on InfoGAN.
Deciding What to Learn: A Rate-Distortion Approach [link]
Dilip Arumugam, Benjamin Van Roy
ICML, 2021
🐤 Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization [link]
Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish
Preprint, 2021
Multi-Task Variational Information Bottleneck [link]
Weizhu Qian, Bowei Chen, Yichao Zhang, Guanghui Wen, Franck Gechter
Preprint, 2021
🐤 Analyzing neural codes using the information bottleneck method [link]
Elad Schneidman, Noam Slonim, Naftali Tishby, Rob R. deRuyter van Steveninck, William Bialek
NIPS, 2001
Past-future information bottleneck in dynamical systems [link]
Felix Creutzig, Amir Globerson, Naftali Tishby
Physical Review, 2009
Compressing Neural Networks using the Variational Information Bottleneck [link]
Bin Dai, Chen Zhu, Baining Guo, David Wipf
ICML, 2018
🐤 InfoMask: Masked Variational Latent Representation to Localize Chest Disease [link]
Saeid Asgari Taghanaki, Mohammad Havaei, Tess Berthier, Francis Dutil, Lisa Di Jorio, Ghassan Hamarneh, Yoshua Bengio
MICCAI, 2019
Be aware how this differs from the IBA paper.
Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics [link]
Yihang Wang, João Marcelo Lamim Ribeiro, Pratyush Tiwary
Nature Communications, 2019
Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks [link]
Roman Pogodin, Peter Latham
NeurIPS, 2020
Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification [link]
Lynton Ardizzone, Radek Mackowiak, Carsten Rother, Ullrich Köthe
NeurIPS, 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck [link] [code]
Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson, David Cox
ICML, 2020
Learning Efficient Multi-agent Communication: An Information Bottleneck Approach [link]
Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, Zinovi Rabinovich
ICML, 2020
🐤 Inserting Information Bottlenecks for Attribution in Transformers [link]
Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin
EMNLP, 2020
Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates [link]
Sonali Parbhoo, Mario Wieser, Aleksander Wieczorek, and Volker Roth
Entropy, 2020
Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding [link]
Yigit Ugur, George Arvanitakis, Abdellatif Zaidi
Entropy, 2020
Learning to Learn with Variational Information Bottleneck for Domain Generalization [link]
Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao
ECCV, 2020
The information bottleneck and geometric clustering [link]
DJ Strouse, David J Schwab
Preprint, 2020
Causal learning with sufficient statistics: an information bottleneck approach [link]
Daniel Chicharro, Michel Besserve, Stefano Panzeri
Preprint, 2020
Learning Robust Representations via Multi-View Information Bottleneck [link]
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata
Preprint, 2020
🐤 Information Bottleneck Disentanglement for Identity Swapping [link]
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
CVPR, 2021
A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition [link]
Ayush Srivastava, Oshin Dutta, Jigyasa Gupta, Sumeet Agarwal, Prathosh AP
WACV, 2021
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget [link]
Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine
ICLR, 2020
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning [link]
Rabeeh Karimi mahabadi, Yonatan Belinkov, James Henderson
ICLR, 2021
Dynamic Bottleneck for Robust Self-Supervised Exploration [link]
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
NeurIPS, 2021
Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck [link] [talk]
Junho Kim, Byung-Kwan Lee, Yong Man Ro
NeurIPS, 2021
Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness [link] [talk]
Zifeng Wang, Tong Jian, Aria Masoomi, Stratis Ioannidis, Jennifer Dy
NeurIPS, 2021
A Variational Information Bottleneck Approach to Multi-Omics Data Integration [link]
Changhee Lee, Mihaela van der Schaar
AISTATS, 2021
Information Bottleneck Approach to Spatial Attention Learning [link]
Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
IJCAI, 2021
Unsupervised Hashing with Contrastive Information Bottleneck [link]
Zexuan Qiu, Qinliang Su, Zijing Ou, Jianxing Yu, Changyou Chen
IJCAI, 2021
Neuron Campaign for Initialization Guided by Information Bottleneck Theory [link]
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang
CIKM, 2021
Information Theoretic Meta Learning with Gaussian Processes [link]
Michalis K. Titsias, Francisco J. R. Ruiz, Sotirios Nikoloutsopoulos, Alexandre Galashov
UAI, 2021
A Closer Look at the Adversarial Robustness of Information Bottleneck Models [link]
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal
ICML Workshop on A Blessing in Disguise, 2021
Information Bottleneck Attribution for Visual Explanations of Diagnosis and Prognosis [link]
Ugur Demir, Ismail Irmakci, Elif Keles, Ahmet Topcu, Ziyue Xu, Concetto Spampinato, Sachin Jambawalikar, Evrim Turkbey, Baris Turkbey, Ulas Bagci
Preprint, 2021
State Predictive Information Bottleneck [link] [code]
Dedi Wang, Pratyush Tiwary
Preprint, 2021
Disentangled Variational Information Bottleneck for Multiview Representation Learning [link] [code]
Feng Bao
Preprint, 2021
Invariant Information Bottleneck for Domain Generalization [link]
Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Colorado J. Reed, Jun Zhang, Dongsheng Li, Kurt Keutzer, Han Zhao
Preprint, 2021
Information-Bottleneck-Based Behavior Representation Learning for Multi-agent Reinforcement learning [link]
Yue Jin, Shuangqing Wei, Jian Yuan, Xudong Zhang
Preprint, 2021
Generalization in Quantum Machine Learning: a Quantum Information Perspective [link]
Leonardo Banchi, Jason Pereira, Stefano Pirandola
Preprint, 2021
Causal Effect Estimation using Variational Information Bottleneck [link]
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang
Preprint, 2021
A Closer Look at the Adversarial Robustness of Information Bottleneck Models [link]
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal
ICML Workshop on A Blessing in Disguise, 2021
🐤 Neuron Campaign for Initialization Guided by Information Bottleneck Theory [link]
Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang
CIKM, 2021
Improving Subgraph Recognition with Variational Graph Information Bottleneck [link]
Junchi Yu, Jie Cao, Ran He
CVPR, 2022
Graph Structure Learning with Variational Information Bottleneck [link]
Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, Philip S. Yu
AAAI, 2022
Renyi Fair Information Bottleneck for Image Classification [link]
Adam Gronowski, William Paul, Fady Alajaji, Bahman Gharesifard, Philippe Burlina
Preprint, 2022
The Distributed Information Bottleneck reveals the explanatory structure of complex systems [link]
Kieran A. Murphy, Dani S. Bassett
Preprint, 2021
Sparsity-Inducing Categorical Prior Improves Robustness of the Information Bottleneck [link]
Anirban Samaddar, Sandeep Madireddy, Prasanna Balaprakash
Preprint, 2022
Pareto-optimal clustering with the primal deterministic information bottleneck [link]
Andrew K. Tan, Max Tegmark, Isaac L. Chuang
Preprint, 2022
Information-Theoretic Odometry Learning [link]
Sen Zhang, Jing Zhang, Dacheng Tao
Preprint, 2022
Causal Effect Estimation using Variational Information Bottleneck [link]
Zhenyu Lu, Yurong Cheng, Mingjun Zhong, George Stoian, Ye Yuan, Guoren Wang
Preprint, 2022
InfoBot: Transfer and Exploration via the Information Bottleneck [paper] [code]
Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine
ICLR, 2019
The idea is simply to constrain the dependence on a certain goal, so that the agent can learn a default behavior.
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck [link] [code] [talk]
Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann
NeurIPS, 2019
Learning Task-Driven Control Policies via Information Bottlenecks [link] [spotlight talk]
Vincent Pacelli, Anirudha Majumdar
RSS, 2020
🐤 The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach [journal '20] [arxiv '18]
Iulian Vlad Serban, Chinnadhurai Sankar, Michael Pieper, Joelle Pineau, Yoshua Bengio
Journal of Artificial Intelligence Research (JAIR), 2020
Learning Robust Representations via Multi-View Information Bottleneck [link] [code] [talk]
Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata
ICLR, 2020
DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [paper] [code]
Jiameng Fan, Wenchao Li
ICML, 2022
Learning Representations in Reinforcement Learning: an Information Bottleneck Approach [link] [code]
Yingjun Pei, Xinwen Hou
Rejected by ICLR, 2020
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [link]
Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin
ArXiv, 2020
Dynamic Bottleneck for Robust Self-Supervised Exploration [paper] [code]
Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye HAO, Peng Liu, Zhaoran Wang
NeurIPS, 2021
Regret Bounds for Information-Directed Reinforcement Learning [paper]
Botao Hao, Tor Lattimore
ArXiv, 2022
😣😣😣 Mutual information is notoriously hard to estimate!
🐤 Benchmarking Mutual Information [link] [code] [doc]
Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx
NeurIPS, 2023
Variational f-Divergence and Derangements for Discriminative Mutual Information Estimation [link] [code]
Nunzio A. Letizia, Nicola Novello, Andrea M. Tonello
ArXiv, 2023
Estimating Mutual Information [link] [code]
Alexander Kraskov, Harald Stoegbauer, Peter Grassberger
Physical Review, 2004
Efficient Estimation of Mutual Information for Strongly Dependent Variables [link] [code]
Shuyang Gao, Greg Ver Steeg, Aram Galstyan
AISTATS, 2015
- This shows that KNN-based estimators requires number of samples which scales exponentially with the true MI; that is, they become inaccurate as MI gets large.
- Thus, as the relationship become more dependent, the MI estimation becomes more inaccurate. Or in other words, KNN-based estimators are only good at detecting independence of variables.
🐤 MINE
: Mutual Information Neural Estimation [link] [code]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
ICML, 2018
Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane [link] [code]
Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng
ECCV, 2018
🐤 InfoMax
: Learning Deep representations by Mutual Information Estimation and Maximization [link] [code]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
ICLR, 2019 (Oral)
🐤 On Variational Bounds of Mutual Information [link] [PyTorch]
Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
ICML, 2019
🐤 Estimating Information Flow in Deep Neural Networks [link] [PyTorch]
Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
ICML, 2019
Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling [link] [code]
Sina Molavipour, Germán Bassi, Mikael Skoglund
Preprint, 2020
CCMI
: Classifier based Conditional Mutual Information Estimation [link] [code]
Sudipto Mukherjee, Himanshu Asnani, Sreeram Kannan
UAI, 2020
MIGE
: Mutual Information Gradient Estimation for Representation Learning [link] [code]
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
ICLR, 2020
🐤 Information Bottleneck: Exact Analysis of (Quantized) Neural Networks [link]
Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen
Preprint, 2021
- This paper shows that different ways of binning when computing the mutual information leads to qualitatively different results.
- It then confirms then original IB paper's results of the fitting & compression phase using quantized nets with exact computation for mutual information.
🐤 Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [link] [code]
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao
Preprint, 2021
Entropy and mutual information in models of deep neural networks [link]
Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
NeurIPS, 2018
🐤 Understanding the Limitations of Variational Mutual Information Estimators [link] [PyTorch]
Jiaming Song, Stefano Ermon
ICLR, 2020
- This implementation includes
InfoNCE
,NWJ
,NWJ-JS
,MINE
, and their own methodSMILE
.- Basically, they show that the variance of traditional MI estimation can grow exponentially with true MI. In other words, just as KNN estimators, the more dependent (the higher MI), the less accurate.
- Also, those estimators does not satisfy some important self-consistency properties, such as data processing inequality.
- They propose SMILE which aims to reduce the variance issue.
🐤🐤 Sliced Mutual Information: A Scalable Measure of Statistical Dependence [link]
Ziv Goldfeld, Kristjan Greenewald
NeurIPS, 2021 (spotlight)
🐤 TImproving Mutual Information Estimation with Annealed and Energy-Based Bounds [link]
Qing Guo, Junya Chen, Dong Wang, Yuewei Yang, Xinwei Deng, Lawrence Carin, Fan Li, Chenyang Tao
ICLR, 2022
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy
[link] [code]
Danqi Liao*, Chen Liu*, Benjamin W Christensen, Alexander Tong, Guillaume Huguet, Guy Wolf, Maximilian Nickel, Ian Adelstein, Smita Krishnaswamy
ICML Workshop, 2023
This paper leverages diffusion geometry to estimate Entropy and MI in high dimensional representations of modern neural networks.
f-GANs in an Information Geometric Nutshell [link]
Richard Nock, Zac Cranko, Aditya K. Menon, Lizhen Qu, Robert C. Williamson
NeurIPS, 2017
Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach [link]
Roel Dobbe, David Fridovich-Keil, Claire Tomlin
NeurIPS, 2017
Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications [link]
Linus Hamilton, Frederic Koehler, Ankur Moitra
NeurIPS, 2017
Information-theoretic analysis of generalization capability of learning algorithms [link]
Aolin Xu, Maxim Raginsky
NeurIPS, 2017
Learning Discrete Representations via Information Maximizing Self-Augmented Training [link]
Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
ICML, 2017
🐣 Nonparanormal Information Estimation [link]
Shashank Singh, Barnabás Póczos
ICML, 2017
This paper shows how to robustly estimate mutual information using i.i.d. samples from unknown distribution.
Entropy and mutual information in models of deep neural networks [link]
Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
NeurIPS, 2018
Chaining Mutual Information and Tightening Generalization Bounds [link]
Amir Asadi, Emmanuel Abbe, Sergio Verdu
NeurIPS, 2018
Information Constraints on Auto-Encoding Variational Bayes [link]
Romain Lopez, Jeffrey Regier, Michael I. Jordan, Nir Yosef
NeurIPS, 2018
Adaptive Learning with Unknown Information Flows [link]
Yonatan Gur, Ahmadreza Momeni
NeurIPS, 2018
Information-based Adaptive Stimulus Selection to Optimize Communication Efficiency in Brain-Computer Interfaces [link]
Boyla Mainsah, Dmitry Kalika, Leslie Collins, Siyuan Liu, Chandra Throckmorton
NeurIPS, 2018
Information Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization [link]
Ibrahim Alabdulmohsin
ICML, 2018
Mutual Information Neural Estimation [link]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
ICML, 2018
Learning to Explain: An Information-Theoretic Perspective on Model Interpretation [link]
Jianbo Chen, Le Song, Martin Wainwright, Michael Jordan
ICML, 2018
Fast Information-theoretic Bayesian Optimisation [link]
Binxin Ru, Michael A. Osborne, Mark Mcleod, Diego Granziol
ICML, 2018
Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond [link]
Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni
NeurIPS, 2019
Information-Theoretic Confidence Bounds for Reinforcement Learning [link]
Xiuyuan Lu, Benjamin Van Roy
NeurIPS, 2019
L-DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise [link]
Yilun Xu, Peng Cao, Yuqing Kong, Yizhou Wang
NeurIPS, 2019
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio [link]
Julian Zimmert, Tor Lattimore
NeurIPS, 2019
Region Mutual Information Loss for Semantic Segmentation [link]
Shuai Zhao, Yang Wang, Zheng Yang, Deng Cai
NeurIPS, 2019
Learning Representations by Maximizing Mutual Information Across Views [link]
Philip Bachman, R Devon Hjelm, William Buchwalter
NeurIPS, 2019
Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model [link]
Wenbo Gong, Sebastian Tschiatschek, Sebastian Nowozin, Richard E. Turner, José Miguel Hernández-Lobato, Cheng Zhang
NeurIPS, 2019
Thompson Sampling with Information Relaxation Penalties [link]
Seungki Min, Costis Maglaras, Ciamac C. Moallemi
NeurIPS, 2019
InfoMax: Learning deep representations by mutual information estimation and maximization [link][code]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio
ICLR, 2019
Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds [link]
Peng Cao, Yilun Xu, Yuqing Kong, Yizhou Wang
ICLR, 2019
Information-Directed Exploration for Deep Reinforcement Learning [link]
Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
ICLR, 2019
Soft Q-Learning with Mutual-Information Regularization [link]
Jordi Grau-Moya, Felix Leibfried, Peter Vrancx
ICLR, 2019
Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization [link]
Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
ICLR, 2019
Information Asymmetry in KL-regularized RL [link]
Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess
ICLR, 2019
Adaptive Estimators Show Information Compression in Deep Neural Networks [link]
Ivan Chelombiev, Conor Houghton, Cian O'Donnell
ICLR, 2019
Information Theoretic lower bounds on negative log likelihood [link]
Luis A. Lastras-Montaño
ICLR, 2019
New results on information theoretic clustering [link] [code]
Ferdinando Cicalese, Eduardo Laber, Lucas Murtinho
ICML, 2019
Estimating Information Flow in Deep Neural Networks [link]
Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
ICML, 2019
🐣 The information-theoretic value of unlabeled data in semi-supervised learning [link]
Alexander Golovnev, David Pal, Balazs Szorenyi
ICML, 2019
EMI: Exploration with Mutual Information [link] [code]
Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song
ICML, 2019
🐣 On Variational Bounds of Mutual Information [link]
Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George Tucker
ICML, 2019
Where is the Information in a Deep Neural Network? [link]
Alessandro Achille, Giovanni Paolini, Stefano Soatto
Preprint, 2020
Information Maximization for Few-Shot Learning [link]
Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, Jose Dolz, Pablo Piantanida, Ismail Ben Ayed
NeurIPS, 2020
Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information [link]
Genevieve Flaspohler, Nicholas A. Roy, John W. Fisher III
NeurIPS, 2020
Predictive Information Accelerates Learning in RL [link]
Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama
NeurIPS, 2020
"The predictive information is the mutual information between the past and the future,
$I(X_{\text{past}}; X_{\text{future}})$ ."
Information Theoretic Regret Bounds for Online Nonlinear Control [link]
Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun
NeurIPS, 2020
Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds [link]
Hassan Hafez-Kolahi, Zeinab Golgooni, Shohreh Kasaei, Mahdieh Soleymani
NeurIPS, 2020
Variational Interaction Information Maximization for Cross-domain Disentanglement [link]
HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim
NeurIPS, 2020
Information theoretic limits of learning a sparse rule [link]
Clément Luneau, jean barbier, Nicolas Macris
NeurIPS, 2020
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks [link]
Ryo Karakida, Kazuki Osawa
NeurIPS, 2020
🐣 On Mutual Information Maximization for Representation Learning [link]
Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic
ICLR, 2020
🐣 Understanding the Limitations of Variational Mutual Information Estimators [link]
Jiaming Song, Stefano Ermon
ICLR, 2020
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation [link]
Philipp Becker, Oleg Arenz, Gerhard Neumann
ICLR, 2020
Mutual Information Gradient Estimation for Representation Learning [link]
Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
ICLR, 2020
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization [link]
Fan-Yun Sun, Jordan Hoffman, Vikas Verma, Jian Tang
ICLR, 2020
A Mutual Information Maximization Perspective of Language Representation Learning [link]
Lingpeng Kong, Cyprien de Masson d'Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama
ICLR, 2020
CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [link] [code]
Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin
ICML, 2020
Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains [link] [code]
Johannes Fischer, Ömer Sahin Tas
ICML, 2020
Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation [link] [code]
Steven Kleinegesse, Michael U. Gutmann
ICML, 2020
FR-Train: A Mutual Information-Based Approach to Fair and Robust Training [link] [code]
Yuji Roh, Kangwook Lee, Steven Whang, Changho Suh
ICML, 2020
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information [link] [code]
Karl Stratos, Sam Wiseman
ICML, 2020
Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective [link]
Ruixiang Zhang, Masanori Koyama, Katsuhiko Ishiguro
ICML, 2020
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization [link] [code]
Sicheng Zhu, Xiao Zhang, David Evans
ICML, 2020
Usable Information and Evolution of Optimal Representations During Training [link]
Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao
ICLR, 2021
Domain-Robust Visual Imitation Learning with Mutual Information Constraints [link]
Edoardo Cetin, Oya Celiktutan
ICLR, 2021
Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning [link]
Kanil Patel, William H. Beluch, Bin Yang, Michael Pfeiffer, Dan Zhang
ICLR, 2021
Graph Information Bottleneck for Subgraph Recognition [link]
Junchi Yu, Tingyang Xu, Yu Rong, Yatao Bian, Junzhou Huang, Ran He
ICLR, 2021
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [link]
Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
ICLR, 2021
Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information [link] [slides]
Willie Neiswanger, Ke Alexander Wang, Stefano Ermon
ICML, 2021
Decomposed Mutual Information Estimation for Contrastive Representation Learning [link]
Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Philip Bachman, Remi Tachet Des Combes
ICML, 2021
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [link] [code]
Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
Preprint, 2021
Intelligence, physics and information – the tradeoff between accuracy and simplicity in machine learning [link]
Tailin Wu
PhD Thesis, 2021
The Information Geometry of Unsupervised Reinforcement Learning [link]
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
Preprint, 2021
If you would like to cite this repository 🐣:
@misc{git2022ib,
title = {Awesome Information Bottleneck},
author = {Ziyu Ye},
howpublished = {\url{https://github.com/ZIYU-DEEP/Awesome-Information-Bottleneck}},
year = 2022}