-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add embedding scale to nn.Embedding. #17
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,33 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m now sure that we want to copy the test code across experiments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to make the model dir itself as self-contained as possible so that it can be modified
independently, with some drawbacks that there are some duplicates.
We can use symlinks here if someone also agrees. @danpovey what do you think.
Add Madam optimizer. Use default parameters with Tensorboard log: |
Here are the results for this pull-request: HLG 1best decoding(no LM rescoring, no attention-decoder rescoring)(model averaging from
(model averaging from
HLG decoding + 4-gram LM rescoring (whole lattice rescoring, without attention-decoder)(model averaging from
HLG + 4-gram LM rescoring (whole lattice-rescoring) + attention-decoder rescoring((model averaging from
((model averaging from
|
The results are comparable with the one from the latest master WERs of test-clean and test-other: test-clean:
test-other:
|
I think the reason this doesn't make much difference is that since this embedding is only used as an input, leaving it as random vectors works OK since the rest of the network can just figure out what to with it. But I think it's probably good practice to train it regardless. There might be setups with larger vocabs, where this matters. |
Differences between
conformer_ctc
andconformer_ctc_embedding_scale
conformer_ctc_embedding_scale
replacesnn.Embedding
with modifiedEmbedding
. Modified embedding contains two changes:(1) The weight matrix is initialized to the range
(-std, std)
wherestd = 1 / sqrt(embedding_dim)
(2) The output of the embedding is scaled by
sqrt(embedding_dim)
Also,
conformer_ctc_embedding_scale
modifies thePositionalEncoding
in
transformer.py
. It replaceswith
You can use
to find the exact differences.