Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding scale to nn.Embedding. #17

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

csukuangfj
Copy link
Collaborator

Differences between conformer_ctc and conformer_ctc_embedding_scale

conformer_ctc_embedding_scale replaces nn.Embedding with modified
Embedding. Modified embedding contains two changes:

  • (1) The weight matrix is initialized to the range (-std, std) where
    std = 1 / sqrt(embedding_dim)

  • (2) The output of the embedding is scaled by sqrt(embedding_dim)

Also, conformer_ctc_embedding_scale modifies the PositionalEncoding
in transformer.py. It replaces

self.xscale = math.sqrt(self.d_model)
x = x * self.xscale + self.pe[:, : x.size(1), :]

with

self.pos_scale = 1. / math.sqrt(self.d_model)
x = x + self.pe[:, : x.size(1), :] * self.pos_scale

You can use

diff conformer_ctc/transformer.py conformer_ctc_embedding_scale/transformer.py

to find the exact differences.

@@ -0,0 +1,33 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m now sure that we want to copy the test code across experiments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to make the model dir itself as self-contained as possible so that it can be modified
independently, with some drawbacks that there are some duplicates.

We can use symlinks here if someone also agrees. @danpovey what do you think.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Aug 26, 2021

Add Madam optimizer.

Use default parameters with Foam.

Tensorboard log:
https://tensorboard.dev/experiment/xcJ2nf6YRjGMtJdjjLH0LQ/

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Sep 8, 2021

Here are the results for this pull-request:
(Only the best WERs of each method are listed)

HLG 1best decoding(no LM rescoring, no attention-decoder rescoring)

(model averaging from epoch-40.pt to epoch-49.pt)

2021-09-08 04:35:32,223 INFO [utils.py:317] [test-clean-no_rescore] %WER 2.99% [1573 / 52576, 134 ins, 244 del, 1195 sub ]
2021-09-08 04:37:53,973 INFO [utils.py:317] [test-other-no_rescore] %WER 6.98% [3652 / 52343, 203 ins, 842 del, 2607 sub ] 

(model averaging from epoch-22.pt to epoch-49.pt)

2021-09-07 21:50:25,598 INFO [utils.py:317] [test-clean-no_rescore] %WER 3.05% [1602 / 52576, 140 ins, 287 del, 1175 sub ]
2021-09-07 21:52:46,439 INFO [utils.py:317] [test-other-no_rescore] %WER 6.87% [3597 / 52343, 189 ins, 880 del, 2528 sub ]

HLG decoding + 4-gram LM rescoring (whole lattice rescoring, without attention-decoder)

(model averaging from epoch-19.pt to epoch-48.pt)

2021-09-08 08:07:39,492 INFO [utils.py:317] [test-clean-lm_scale_0.6] %WER 2.66% [1399 / 52576, 159 ins, 126 del, 1114 sub ]
2021-09-08 08:14:59,870 INFO [utils.py:317] [test-other-lm_scale_0.7] %WER 6.00% [3140 / 52343, 234 ins, 484 del, 2422 sub ]

HLG + 4-gram LM rescoring (whole lattice-rescoring) + attention-decoder rescoring

((model averaging from epoch-24.pt to epoch-49.pt)

2021-09-07 22:05:24,962 INFO [decode.py:437]
For test-clean, WER of different settings are:
ngram_lm_scale_1.0_attention_scale_0.6  2.64  best for test-clean
ngram_lm_scale_0.9_attention_scale_0.6  2.65
ngram_lm_scale_1.0_attention_scale_0.7  2.65
ngram_lm_scale_1.3_attention_scale_1.3  2.65
ngram_lm_scale_0.9_attention_scale_0.5  2.66
ngram_lm_scale_0.9_attention_scale_0.7  2.66
ngram_lm_scale_1.0_attention_scale_0.9  2.66
ngram_lm_scale_1.0_attention_scale_1.0  2.66

2021-09-07 22:17:03,741 INFO [decode.py:437]
For test-other, WER of different settings are:
ngram_lm_scale_0.9_attention_scale_0.7  5.91  best for test-other
ngram_lm_scale_1.0_attention_scale_0.9  5.91
ngram_lm_scale_0.7_attention_scale_0.6  5.92
ngram_lm_scale_0.7_attention_scale_0.7  5.92
ngram_lm_scale_0.9_attention_scale_0.5  5.92
ngram_lm_scale_0.9_attention_scale_1.1  5.92
ngram_lm_scale_0.9_attention_scale_1.2  5.92

((model averaging from epoch-33.pt to epoch-48.pt)

For test-clean, WER of different settings are:
ngram_lm_scale_0.9_attention_scale_0.5  2.73  best for test-clean
ngram_lm_scale_0.9_attention_scale_0.6  2.73
ngram_lm_scale_0.9_attention_scale_0.7  2.73
ngram_lm_scale_1.5_attention_scale_1.5  2.73
ngram_lm_scale_1.5_attention_scale_1.7  2.73
ngram_lm_scale_1.0_attention_scale_0.5  2.74
ngram_lm_scale_1.0_attention_scale_0.6  2.74

For test-other, WER of different settings are:
ngram_lm_scale_1.2_attention_scale_0.9  5.85  best for test-other
ngram_lm_scale_1.7_attention_scale_1.9  5.85
ngram_lm_scale_1.2_attention_scale_1.0  5.86
ngram_lm_scale_1.5_attention_scale_1.9  5.86
ngram_lm_scale_1.7_attention_scale_2.0  5.86
ngram_lm_scale_0.9_attention_scale_0.6  5.87                                                                                          ngram_lm_scale_1.2_attention_scale_1.1  5.87
ngram_lm_scale_1.2_attention_scale_1.2  5.87

@csukuangfj
Copy link
Collaborator Author

The results are comparable with the one from the latest master
(https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)

WERs of test-clean and test-other:

test-clean:

  • this pull-request: 2.64
  • master: 2.57

test-other:

  • this pull-request: 5.85
  • master: 5.94

@danpovey
Copy link
Collaborator

danpovey commented Sep 8, 2021

I think the reason this doesn't make much difference is that since this embedding is only used as an input, leaving it as random vectors works OK since the rest of the network can just figure out what to with it. But I think it's probably good practice to train it regardless. There might be setups with larger vocabs, where this matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants