-
Notifications
You must be signed in to change notification settings - Fork 0
/
finetuning.qmd
36 lines (26 loc) · 2.1 KB
/
finetuning.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
title: "Finetuning on low-confidence data"
about:
template: marquee
links:
- icon: github
text: Github
href: https://github.com/FrenchKrab/IS2024-powerset-calibration
- icon: book
text: Google Scholar
href: https://scholar.google.com/citations?user=7gJ465gAAAAJ
---
# ECE and DER with x seconds of annotated training data
In the paper we show only a few points of data to make the figures readable : 30, 300 and 1200 seconds. We present here the figures with all their runs. Each point is the average of 3 seeds.
The figure is interactive so that you can zoom in and look at the detail of each point of data.
<!-- 22.2_draw_ece_for_finetune_gridlike_plotly.ipynb -->
```{=html}
<iframe src="site_media/finetune/finetune_plot.html" onload='javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+"px";}(this));' style="height:200px;width:100%;border:none;overflow:hidden;"></iframe>
```
# Reproducibility
The model is trained on subsets of the DIHARD domains. They are composed of multiple regions from all files in the training set, these regions are selected with multiple strategies (that depend either on random sampling and/or the predictions of the model available). We make the selected training regions available as UEM files.
- [UEM regions selected for training](https://github.com/FrenchKrab/IS2024-powerset-calibration/tree/master/data/finetuning/uems)
- Output of the model used to determine the regions: [[.parquet]](https://huggingface.co/aplaquet/IS2024-powerset-calibration/blob/main/model_inference/pretrained%40dh_train.inf.parquet) [[associated metadata]](https://huggingface.co/aplaquet/IS2024-powerset-calibration/blob/main/model_inference/pretrained%40dh_train.meta.yaml)
After training, we obtained checkpoints on which we computed DER and ECE. This data is also available:
- [Link to the raw metric data](https://github.com/FrenchKrab/IS2024-powerset-calibration/tree/master/data/finetuning/ece_der)
- [Link to the finetuned model checkpoints (9Go)](https://huggingface.co/aplaquet/IS2024-powerset-calibration/blob/main/checkpoints-conf_rand-seeds123.zip)