You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm going to do several experiments around distillation data decoding, and it will be easier to write up the results here as they are all related to the same part of the pipeline, translate-mono-src and translate-corpus.
My plan is to test on da-en since it was a good model result and should be indicative of quality drops.
The data for this experiment is available in this spreadsheet. I measured both the GPU utilization and how much data was being written into the target file in terms of bytes/sec. Each run operated on the same subset of the data. I measured 15 minutes of translations across 5 batches, and summarized the results to get a sample of how fast the translations were happening.
decoder
precision
teachers
maxi-batch-words
maxi-batch
gpu utilization
bytes/sec
vs 500
vs Marian Best
marian
float32
2
500
1,000
64.7
156,118
marian
float16
2
500
1,000
57.5
183,627
118%
marian
float16
2
4,000
1,000
61.5
329,703
211%
marian
float16
2
5,000
1,000
57.4
338,798
217%
marian
float16
2
5,000
10,000
57.2
348,805
223%
marian
float16
1
5,000
10,000
55.9
601,635
385%
marian
float16
2
8,000
1,000
-
Out of Memory
-
-
ctranslate2
float16
1
5,000
-
97.2
1,187,192
760%
197%
Then run the experiments on the decoder/ensemble configurations.
inference
teacher ensemble
student comet
vs baseline
gpu hours
vs baseline
wall time
vs baseline
marian (bad batching)
2
-
-
597 hours
100%
78.0 hours
100%
marian
2
88.67
-
288 hours
48%
42.3 hours
54%
marian
1
88.42
-0.25
147 hours
27%
6.4 hours
8%
ctranslate2
1
88.35
-0.32
69 hours
12%
2.7 hours
3%
Wall time here refers to the time all of the parallelized translate-* tasks took from the start of the first one, the finish of the last.
The text was updated successfully, but these errors were encountered:
I'm going to do several experiments around distillation data decoding, and it will be easier to write up the results here as they are all related to the same part of the pipeline,
translate-mono-src
andtranslate-corpus
.My plan is to test on
da-en
since it was a good model result and should be indicative of quality drops.The data for this experiment is available in this spreadsheet. I measured both the GPU utilization and how much data was being written into the target file in terms of bytes/sec. Each run operated on the same subset of the data. I measured 15 minutes of translations across 5 batches, and summarized the results to get a sample of how fast the translations were happening.
Then run the experiments on the decoder/ensemble configurations.
100%
100%
88.67
48%
54%
88.42
-0.25
27%
8%
88.35
-0.32
12%
3%
Wall time here refers to the time all of the parallelized
translate-*
tasks took from the start of the first one, the finish of the last.The text was updated successfully, but these errors were encountered: