Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
attention.py		attention.py
attention_test.py		attention_test.py
cls_head.py		cls_head.py
cls_head_test.py		cls_head_test.py
dense_einsum.py		dense_einsum.py
dense_einsum_test.py		dense_einsum_test.py
gated_feedforward.py		gated_feedforward.py
gated_feedforward_test.py		gated_feedforward_test.py
masked_softmax.py		masked_softmax.py
masked_softmax_test.py		masked_softmax_test.py
on_device_embedding.py		on_device_embedding.py
on_device_embedding_test.py		on_device_embedding_test.py
position_embedding.py		position_embedding.py
position_embedding_test.py		position_embedding_test.py
rezero_transformer.py		rezero_transformer.py
rezero_transformer_test.py		rezero_transformer_test.py
self_attention_mask.py		self_attention_mask.py
talking_heads_attention.py		talking_heads_attention.py
talking_heads_attention_test.py		talking_heads_attention_test.py
transformer.py		transformer.py
transformer_scaffold.py		transformer_scaffold.py
transformer_scaffold_test.py		transformer_scaffold_test.py
transformer_test.py		transformer_test.py
util.py		util.py

README.md

Layers

Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.

DenseEinsum implements a feedforward network using tf.einsum. This layer contains the einsum op, the associated weight, and the logic required to generate the einsum expression for the given initialization parameters.
MultiHeadAttention implements an optionally masked attention between two tensors, from_tensor and to_tensor, as described in "Attention Is All You Need". If from_tensor and to_tensor are the same, then this is self-attention.
CachedAttention implements an attention layer with cache used for auto-agressive decoding.
TalkingHeadsAttention implements the talking heads attention, as decribed in "Talking-Heads Attention".
Transformer implements an optionally masked transformer as described in "Attention Is All You Need".
ReZeroTransformer implements Transformer with ReZero described in "ReZero is All You Need: Fast Convergence at Large Depth".
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
MaskedSoftmax implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
ClassificationHead A pooling head over a sequence of embeddings, commonly used by classification tasks.
GatedFeedforward implements the gated linear layer feedforward as described in "GLU Variants Improve Transformer".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layers

layers

README.md

Layers

Files

layers

Directory actions

More options

Directory actions

More options

Latest commit

History

layers

Folders and files

parent directory

README.md

Layers