This repository contains all code needed to reproduce the experimental results for the paper:
Elastic Feature Consolidation For Cold Start Exemplar-Free Incremental Learning
Simone Magistri, Tomaso Trinci, Albin Soutif, Joost van de Weijer, Andrew D. Bagdanov
(ICLR2024)
Exemplar-Free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, which results in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose a simple and effective approach that consolidates feature representations by regularizing drift in directions highly relevant to previous tasks and employs prototypes to reduce task-recency bias. Our method, called Elastic Feature Consolidation (EFC), exploits a tractable second-order approximation of feature drift based on an Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes used in a novel asymmetric cross entropy loss which effectively balances prototype rehearsal with data from new tasks. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset and ImageNet-1K demonstrate that Elastic Feature Consolidation is better able to learn new tasks by maintaining model plasticity and significantly outperform the state-of-the-art.
If this code is useful in your research, please cite it as follows:
@inproceedings{
magistri2024elastic,
title={Elastic Feature Consolidation For Cold Start Exemplar-Free Incremental Learning},
author={Simone Magistri and Tomaso Trinci and Albin Soutif and Joost van de Weijer and Andrew D. Bagdanov},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=7D9X2cFnt1}
}
To run the EFC code you must create an Anaconda environment from the environment.yml
file and activate it:
conda env create -n EFC -f environment.yml
conda activate EFC
This codebase is inspired by FACIL and is structured as follows:
main.py
: This script is used to run the experiments.
-
continual_learning/IncrementalApproach.py
: This is the base class forElasticFeatureConsolidation
. It allows you to set optimization settings, such as the scheduler, learning rate, and optimizer, using theOptimizerManager
class. -
continual_learning/ElasticFeatureConsolidation.py
: This module implements Elastic Feature Consolidation. This class inherits from theIncrementalApproach
class.
-
continual_learning/utils/OptimizerManager.py
: This class sets the optimizer for running the experiment. -
continual_learning/utils/empirical_feature_matrix.py
: This code is responsible for computing the Empirical Feature Matrix. -
continual_learning/utils/proto.py
: This module contains the prototype generator class. -
continual_learning/models/BaseModel.py
: This module defines the incremental model. -
continual_learning/utilities
: This directory contains various scripts to compute metrics. TheSummaryLogger
generates thesummary.csv
file, while theLogger
class generates accuracy matrices for each task.
The results are stored in the path specified by the -op
flag. A file named summary.csv
will be generated, which contains the following performance metrics:
-
Per_step_taw_acc
: Per-step task-aware accuracy after each task. -
Last_per_step_taw_acc
: Per-step task-aware accuracy after the last task. -
Per_step_tag_acc
: Per-step task-agnostic accuracy after each task. -
Last_per_step_tag_acc
: Per-step task-agnostic accuracy after the last task (as described in the left formula in Equation 16 in the main paper). -
Average_inc_acc
: Average incremental accuracy (as described in the right formula in Equation 16 in the main paper).
Use the following command-line arguments to configure the behavior of the code:
-op
: The folder path where results are stored. The name of the experiment is randomly generated.--nw
: Number of workers for data loaders.--epochs_first_task
: Number of epochs for the first task (default=100).--epochs_next_task
: Number of epochs for tasks after the first one (default=100).--seed
: Random seed (default=0).--device
: GPU device to use (default=0).--n_task
: Number of tasks, including the first task.--n_class_first_task
: Number of classes in the first task.--efc_lamb
: Lambda value associated with the empirical feature matrix (default=10).--efc_damping
: Eta value as described in the main paper (default=0.1).--efc_protoupdate
: Whether to update the prototype using the empirical feature matrix with a specified sigma for the Gaussian kernel. If set to -1, no prototype update is performed (default=0.2).--dataset
: Dataset name (default=cifar100).--data_path
: The data folder where imagenet subset and tiny-imagenet datasets are stored.--firsttask_modelpath
: Start the training from a checkpoint for the first task if available.
The default hyperparameters are the ones used to compute the Table 1 in the main paper.
- 10 Step
python -u main.py -op ./ws_cifar100_10step --dataset cifar100 --n_task 11 --n_class_first_task 50 --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
- 20 Step
python -u main.py -op ./ws_cifar100_20step --dataset cifar100 --n_task 21 --n_class_first_task 40 --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
The default hyperparameters are the ones used to compute the Table 1 in the main paper.
- 10 Step
python -u main.py -op ./cs_cifar100_10step --dataset cifar100 --n_task 10 --n_class_first_task 10 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
- 20 Step
python -u main.py -op ./cs_cifar100_20step --dataset cifar100 --n_task 20 --n_class_first_task 5 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
The commands are similar, with the only difference being the data-folder "cl_data," where both datasets are downloaded, should be specified.
Here the 10-step and 20-step scenario Warm Start (WS) for Tiny-ImageNet and ImageNet-Subset.
python -u main.py -op ./ws_tinyimagenet_10step --dataset tiny-imagenet --n_task 11 --n_class_first_task 100 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
python -u main.py -op ./ws_imagenetsubset_10task --dataset imagenet-subset --n_task 11 --n_class_first_task 50 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 160 --epochs_next_task 100
python -u main.py -op ./ws_tinyimagenet_20step --dataset tiny-imagenet --n_task 21 --n_class_first_task 100 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
python -u main.py -op ./ws_imagenetsubset_20task --dataset imagenet-subset --n_task 21 --n_class_first_task 40 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 160 --epochs_next_task 100
Here the 10-step and 20-step scenario Cold Start (CS) for Tiny-ImageNet and ImageNet-Subset.
python -u main.py -op ./cs_tinyimagenet_10step --dataset tiny-imagenet --n_task 10 --n_class_first_task 20 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
python -u main.py -op ./cs_imagenetsubset_10step --dataset imagenet-subset --n_task 10 --n_class_first_task 10 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 160 --epochs_next_task 100
python -u main.py -op ./cs_tinyimagenet_20step --dataset tiny-imagenet --n_task 20 --n_class_first_task 10 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 100 --epochs_next_task 100
python -u main.py -op ./cs_imagenetsubset_20step --dataset imagenet-subset --n_task 20 --n_class_first_task 5 --data_path ./cl_data --approach efc --nw 12 --seed 0 --epochs_first_task 160 --epochs_next_task 100
In the bash file experiments.sh
all the experiments for all the scenarios can be run.
Please check the MIT license that is listed in this repository.