This is the official repository of paper: AUDIO PROMPT TUNING FOR UNIVERSAL SOUND SEPARATION. This work is a simple yet effective approach to enhance existing universal sound separation systems. Audio prompt tuning (APT) improves the separation performance of specific sources through training a small number of prompt parameters with limited data, while maintaining the generalization of the universal sound separation model by keeping its parameters frozen. The number of tuned parameters are less than 0.1% of the parameters of the backbone model.
We evaluate our method on MUSDB18 and ESC-50 dataset. Average SDR scores of APT and average prompt embedding without tuning (Baseline) list in the following table.
Model | MUSDB18_fulldata | ESC-50_fulldata |
---|---|---|
APT | 4.98 | 8.50 |
Baseline | 4.31 | 6.44 |
Few-shot experiments are carried on ESC-50 datasets.
Model | ESC-50_1-shot | ESC-50_5-shot | ESC-50_10-shot |
---|---|---|---|
APT | 4.57 | 6.68 | 7.59 |
Baseline | 4.09 | 5.59 | 6.10 |
Detailed results of 50 categories on ESC-50 dataset are available here.
To be done after publishing