Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 310 Bytes

README.md

File metadata and controls

9 lines (5 loc) · 310 Bytes

Transformer Softmax Bottleneck

An analysis of the effects of Mixture of Softmaxes on the Transformer architecture.

Results

alt text

alt text