This is an implementation for paper of Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm Jie Hao, Kaiyi Ji, Mingrui Liu, 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Coreset is a small set that provides a data summary for a large dataset, such that training solely on the small set achieves competitive performance compared with a large dataset. In rehearsal-based continual learning, the coreset is typically used in the memory replay buffer to stand for representative samples in previous tasks, and the coreset selection procedure is typically formulated as a bilevel problem. However, the typical bilevel formulation for coreset selection explicitly performs optimization over discrete decision variables with greedy search, which is computationally expensive. Several works consider other formulations to address this issue, but they ignore the nested nature of bilevel optimization problems and may not solve the bilevel coreset selection problem accurately. To address these issues, we propose a new bilevel formulation, where the inner problem tries to find a model which minimizes the expected training error sampled from a given probability distribution, and the outer problem aims to learn the probability distribution with approximately
$ pip install -r requirements.txt
Before running the code, please download the corresponding dataset and put it the data directory.
$ python main.py --select_type bcsr
If you found this repository helpful, please cite our paper:
@inproceedings{
hao2023bilevel,
title={Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm},
author={Jie Hao, Kaiyi Ji, Mingrui Liu},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=2dtU9ZbgSN}
}'''