This repository contains code demonstrating the UniSSDA method in our CVPR 2024 paper Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias. See arXiv version for appendix.
Create a conda environment:
conda env create -f environment.yml
We prepared two public datasets:
- Office-Home
- DomainNet
python download_data.py
In the data
directory, the txt
folder contains the text files for the splits for each dataset, under the name of the dataset. txt
folder is for covariate shift only, txt_labelshift
folder is for covariate + label shift with same sample size as in txt
folder, and txt_fullsize
folder is for covariate + label shift with full dataset size. Generate splits by navigating to the selected folder and running
python generate_txt.py
To add new dataset (e.g., NewData), it should be placed in a folder named NewData
in the datasets directory (path provided in the arguments for main.py, ./data
by default).
The file structure for the dataset should be:
NewData
│
└───domain1
│ │ image1
│ │ image2
│ │ ...
│
└───domain2
│ │ image1
│ │ image2
│ │ ...
│
...
The splits for each domain is defined as 50% train, 20% validation, 30% test. Few-shot training and validation sets are sampled from the corresponding splits.
In the datasets directory, the txt
folder contains the text files for the splits for each dataset, under the name of the dataset. txt
folder is for covariate shift only, txt_labelshift
folder is for covariate + label shift with same sample size as in txt
folder, and txt_fullsize
folder is for covariate + label shift with full dataset size.
Each row in the text file is in the format: relative_path_of_image_to_dataset_folder class_id
.
(e.g., Clipart/Alarm_Clock/00053.jpg 0
).
To generate the text files for NewData, after ensuring it has the file structure as stated above, create a new folder named NewData
in the txt
folder and run the provided generate_txt.py
.
Next, you have to add configs for the dataset in configs/hparams.py
, configs/data_model_configs.py
, dataloader/dataloader.py
to define training hyperparameters and cross-domain adaptation scenarios.
- Supervised baseline
- CDAC
- PAC
- AdaMatch
- DST
- Proposed method
To add a new algorithm, place it in algorithms/algorithms.py
.
The experiments are organised in a hierarchical way such that:
- Several experiments are collected under one directory assigned by
--experiment_description
. - Each experiment could have different trials, each is specified by
--run_description
.
To train a model:
python main.py --experiment_description expt_run-txt-Resnet34-office_home-openpartial \
--run_description expt-Proposed-kshot-3 \
--da_setting openpartial \
--da_method Proposed \
--dataset office_home \
--backbone Resnet34 \
--num_seeds 3 \
--sampling kshot \
--num_shots 3 \
--data_path "./data/txt"
--data_root "./data"
Sample scripts are in scripts
.
We use Wandb for visualizations of model training.
Sign up for a WandB account using github or google account.
Add --wandb_entity TEAM_NAME
as an argument to main.py where TEAM_NAME
is an existing WandB team you are in.
Eg. --wandb_entity ssda
Additional WandB arguments can be specified through wandb_dir, wandb_project, wandb_tag
for organizing WandB runs, logs and artifacts.
Results for each run are saved in experiments_logs
.
Obtain consolidated results by
python consolidation/consolidate_run.py
@INPROCEEDINGS{zhang2024unissda,
author={Zhang, Wenyu and Liu, Qingmu and Wei Cong, Felix Ong and Ragab, Mohamed and Foo, Chuan-Sheng},
booktitle={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
title={Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias},
year={2024},
volume={},
number={},
pages={23912-23921},
doi={10.1109/CVPR52733.2024.02257}}
This repository is adapted from AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data.