CCLM: Class-conditional Label Noise Modelling

PyTorch Code for the following paper at IbPRIA2023 Title: CCLM: Class-conditional Label Noise Modelling

Authors: Albert Tatjer, Bhalaji Nagarajan, Ricardo Marques, Petia Radeva

Insitute: Universitat de Barcelona

Abstract: The performance of deep neural networks highly depends on the quality and volume of the training data. However, cost-effective labelling processes such as crowdsourcing and web crawling often lead to data with noisy (i.e., wrong) labels. Making models robust to this label noise is thus of prime importance. A common approach is using loss distributions to model the label noise. However, the robustness of these methods highly depends on the accuracy of the division of training set into clean and noisy samples. In this work, we dive in this research direction highlighting the existing problem of treating this distribution globally and propose a class-conditional approach to split the clean and noisy samples. We apply our approach to the popular DivideMix algo- rithm and show how the local treatment fares better with respect to the global treatment of loss distribution. We validate our hypothesis on two popular benchmark datasets and show substantial improvements over the baseline experiments. We further analyze the effectiveness of the proposal using two different metrics - Noise Division Accuracy and Classiness.

Illustration

Code

Please note that CCLM proposes a very simple change in any LNL codebase. In this repository we demonstrate how to undertake this procedure with the base code of DivideMix and C2D. All the new (relevant) code is in class_conditional_utils.py. Finally, one has to call class_conditional_utils.py/ccgmm_codivide() from the eval_train() function accordingly.

Experiments

To run the experiments, choose a baseline, {C2D, DivideMix} DivideMix

$ cd DivideMix
$ mkdir checkpoint
$ python Train_{dataset_name}.py --data_path path-to-your-data --cc

For more information please refer to the DivideMix repository

C2D You first need to download the appropiate pretrained model, here for more information please refer to the C2D repository

$ python3 main_cifar.py --r 0.8 --lambda_u 500 --dataset cifar100 --p_threshold 0.03 --data_path path-to-your-data --experiment-name name-of-your-choice --method selfsup --net resnet50 --cc

Note that the argument --cc will choose the option class-conditional, otherwise you would run the vanilla baseline.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
C2D		C2D
DivideMix		DivideMix
img		img
LICENSE		LICENSE
README.md		README.md
class_conditional_utils.py		class_conditional_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCLM: Class-conditional Label Noise Modelling

About

Releases

Packages

Languages

License

aldakata/CCLM

Folders and files

Latest commit

History

Repository files navigation

CCLM: Class-conditional Label Noise Modelling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages