Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.
-
DenseEinsum implements a feedforward network using tf.einsum. This layer contains the einsum op, the associated weight, and the logic required to generate the einsum expression for the given initialization parameters.
-
MultiHeadAttention implements an optionally masked attention between two tensors, from_tensor and to_tensor, as described in "Attention Is All You Need". If
from_tensor
andto_tensor
are the same, then this is self-attention. -
CachedAttention implements an attention layer with cache used for auto-agressive decoding.
-
TalkingHeadsAttention implements the talking heads attention, as decribed in "Talking-Heads Attention".
-
Transformer implements an optionally masked transformer as described in "Attention Is All You Need".
-
ReZeroTransformer implements Transformer with ReZero described in "ReZero is All You Need: Fast Convergence at Large Depth".
-
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
-
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
-
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
-
MaskedSoftmax implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
-
ClassificationHead A pooling head over a sequence of embeddings, commonly used by classification tasks.
-
GatedFeedforward implements the gated linear layer feedforward as described in "GLU Variants Improve Transformer".