Modules that forward
entire sequences through an RNN :
- AbstractSequencer : an abstract class inherited by Sequencer, Repeater, RecurrentAttention, etc.;
- Sequencer : applies an encapsulated module to all elements in an input sequence (Tensor or Table);
- SeqLSTM : a faster version of
nn.Sequencer(nn.RecLSTM)
where theinput
andoutput
are tensors; - SeqGRU : a faster version of
nn.Sequencer(nn.RecGRU)
where theinput
andoutput
are tensors; - BiSequencer : used for implementing Bidirectional RNNs;
- Repeater : repeatedly applies the same input to an
AbstractRecurrent
instance; - RecurrentAttention : a generalized attention model for REINFORCE modules;
This abstract class implements a light interface shared by
subclasses like : Sequencer
, Repeater
, RecurrentAttention
, BiSequencer
and so on.
When mode='neither'
(the default behavior of the class), the Sequencer will additionally call forget before each call to forward
.
When mode='both'
(the default when calling this function), the Sequencer will never call forget.
In which case, it is up to the user to call forget
between independent sequences.
This behavior is only applicable to decorated AbstractRecurrent modules
.
Accepted values for argument mode
are as follows :
- 'eval' only affects evaluation (recommended for RNNs)
- 'train' only affects training
- 'neither' affects neither training nor evaluation (default behavior of the class)
- 'both' affects both training and evaluation (recommended for LSTMs)
Returns true if the instance has memory. See remember() for details.
Expects a seqlen x batchsize
zeroMask
.
The zeroMask
is then passed to seqlen
criterions by indexing zeroMask[step]
.
When zeroMask=false
, the zero-masking is disabled.
The nn.Sequencer(module)
constructor takes a single argument, module
, which is the module
to be applied from left to right, on each element of the input sequence.
seq = nn.Sequencer(module)
The Sequencer
is a kind of decorator
used to abstract away the intricacies of AbstractRecurrent
modules. While an AbstractRecurrent
instance
requires that a sequence to be presented one input at a time, each with its own call to forward
(and backward
),
the Sequencer
forwards an input
sequence (a table) into an output
sequence (a table of the same length).
It also takes care of calling forget
on AbstractRecurrent
instances.
The Sequencer
inherits AbstractSequencer
The Sequencer
requires inputs and outputs to be of shape seqlen x batchsize x featsize
:
seqlen
is the number of time-steps that will be fed into theSequencer
.batchsize
is the number of examples in the batch. Each example is its own independent sequence.featsize
is the size of the remaining non-batch dimensions. So this could be1
for language models, orc x h x w
for convolutional models, etc.
Above is an example input sequence for a character level language model.
It has seqlen
is 5 which means that it contains sequences of 5 time-steps.
The openning {
and closing }
illustrate that the time-steps are elements of a Lua table, although
it also accepts full Tensors of shape seqlen x batchsize x featsize
.
The batchsize
is 2 as their are two independent sequences : { H, E, L, L, O }
and { F, U, Z, Z, Y, }
.
The featsize
is 1 as their is only one feature dimension per character and each such character is of size 1.
So the input in this case is a table of seqlen
time-steps where each time-step is represented by a batchsize x featsize
Tensor.
Above is another example of a sequence (input or output).
It has a seqlen
of 4 time-steps.
The batchsize
is again 2 which means there are two sequences.
The featsize
is 3 as each time-step of each sequence has 3 variables.
So each time-step (element of the table) is represented again as a tensor
of size batchsize x featsize
.
Note that while in both examples the featsize
encodes one dimension,
it could encode more.
For example, rnn
: an instance of nn.AbstractRecurrent, can forward an input
sequence one forward at a time:
input = {torch.randn(3,4), torch.randn(3,4), torch.randn(3,4)}
rnn:forward(input[1])
rnn:forward(input[2])
rnn:forward(input[3])
Equivalently, we can use a Sequencer to forward the entire input
sequence at once:
seq = nn.Sequencer(rnn)
seq:forward(input)
We can also forward Tensors instead of Tables :
-- seqlen x batchsize x featsize
input = torch.randn(3,3,4)
seq:forward(input)
The Sequencer
can also take non-recurrent Modules (i.e. non-AbstractRecurrent instances) and apply it to each
input to produce an output table of the same length.
This is especially useful for processing variable length sequences (tables).
Internally, the Sequencer
expects the decorated module
to be an
AbstractRecurrent
instance. When this is not the case, the module
is automatically decorated with a Recursor module, which makes it
conform to the AbstractRecurrent
interface.
Note : this is due a recent update (27 Oct 2015), as before this
AbstractRecurrent
and and non-AbstractRecurrent
instances needed to
be decorated by their own Sequencer
. The recent update, which introduced the
Recursor
decorator, allows a single Sequencer
to wrap any type of module,
AbstractRecurrent
, non-AbstractRecurrent
or a composite structure of both types.
Nevertheless, existing code shouldn't be affected by the change.
For a concise example of its use, please consult the simple-sequencer-network.lua training script.
When mode='neither'
(the default behavior of the class), the Sequencer will additionally call forget before each call to forward
.
When mode='both'
(the default when calling this function), the Sequencer will never call forget.
In which case, it is up to the user to call forget
between independent sequences.
This behavior is only applicable to decorated AbstractRecurrent modules
.
Accepted values for argument mode
are as follows :
- 'eval' only affects evaluation (recommended for RNNs)
- 'train' only affects training
- 'neither' affects neither training nor evaluation (default behavior of the class)
- 'both' affects both training and evaluation (recommended for LSTMs)
Calls the decorated AbstractRecurrent
module's forget
method.
This module is a faster version of nn.Sequencer(nn.RecLSTM(inputsize, outputsize))
:
seqlstm = nn.SeqLSTM(inputsize, outputsize)
Each time-step is computed as follows (same as RecLSTM):
i[t] = σ(W[x->i]x[t] + W[h->i]h[t−1] + b[1->i]) (1)
f[t] = σ(W[x->f]x[t] + W[h->f]h[t−1] + b[1->f]) (2)
z[t] = tanh(W[x->c]x[t] + W[h->c]h[t−1] + b[1->c]) (3)
c[t] = f[t]c[t−1] + i[t]z[t] (4)
o[t] = σ(W[x->o]x[t] + W[h->o]h[t−1] + b[1->o]) (5)
h[t] = o[t]tanh(c[t]) (6)
A notable difference is that this module expects the input
and gradOutput
to
be tensors instead of tables. The default shape is seqlen x batchsize x inputsize
for
the input
and seqlen x batchsize x outputsize
for the output
:
input = torch.randn(seqlen, batchsize, inputsize)
gradOutput = torch.randn(seqlen, batchsize, outputsize)
output = seqlstm:forward(input)
gradInput = seqlstm:backward(input, gradOutput)
Note that if you prefer to transpose the first two dimension
(that is, batchsize x seqlen
instead of the default seqlen x batchsize
)
you can set seqlstm.batchfirst = true
following initialization.
For variable length sequences, set seqlstm.maskzero = true
.
This is equivalent to calling RecLSTM:maskZero()
where the RecLSTM
is wrapped by a Sequencer
:
reclstm = nn.RecLSTM(inputsize, outputsize)
reclstm:maskZero(1)
seqlstm = nn.Sequencer(reclstm)
For maskzero = true
, input sequences are expected to be seperated by tensor of zeros for a time step.
Like the RecLSTM
, the SeqLSTM
does not use peephole connections between cell and gates (see RecLSTM for details).
Like the Sequencer
, the SeqLSTM
provides a remember method.
Note that a SeqLSTM
cannot replace RecLSTM
in code that decorates it with a
AbstractSequencer
or Recursor
as this would be equivalent to nn.Sequencer(nn.Sequencer(nn.RecLSTM))
.
You have been warned.
References:
- A. LSTM RNN Architectures for Large Scale Acoustic Modeling
- B. Exploring the Limits of Language Modeling
lstmp = nn.SeqLSTM(inputsize, hiddensize, outputsize)
The SeqLSTM
can implement an LSTM with a projection layer (LSTMP) when hiddensize
and outputsize
are provided.
An LSTMP differs from an LSTM in that after computing the hidden state h[t]
(eq. 6), it is
projected onto r[t]
using a simple linear transform (eq. 7).
The computation of the gates also uses the previous such projection r[t-1]
(eq. 1, 2, 3, 5).
This differs from an LSTM which uses h[t-1]
instead of r[t-1]
.
The computation of a time-step outlined above for the LSTM is replaced with the following for an LSTMP:
i[t] = σ(W[x->i]x[t] + W[r->i]r[t−1] + b[1->i]) (1)
f[t] = σ(W[x->f]x[t] + W[r->f]r[t−1] + b[1->f]) (2)
z[t] = tanh(W[x->c]x[t] + W[h->c]r[t−1] + b[1->c]) (3)
c[t] = f[t]c[t−1] + i[t]z[t] (4)
o[t] = σ(W[x->o]x[t] + W[r->o]r[t−1] + b[1->o]) (5)
h[t] = o[t]tanh(c[t]) (6)
r[t] = W[h->r]h[t] (7)
The algorithm is outlined in ref. A and benchmarked with state of the art results on the Google billion words dataset in ref. B.
An LSTMP can be used with an hiddensize >> outputsize
such that the effective size of the memory cells c[t]
and gates i[t]
, f[t]
and o[t]
can be much larger than the actual input x[t]
and output r[t]
.
For fixed inputsize
and outputsize
, the LSTMP will be able to remember much more information than an LSTM.
This module is a faster version of nn.Sequencer(nn.GRU(inputsize, outputsize))
:
seqGRU = nn.SeqGRU(inputsize, outputsize)
Usage of SeqGRU differs from GRU in the same manner as SeqLSTM differs from LSTM. Therefore see SeqLSTM for more details.
Applies encapsulated fwd
and bwd
rnns to an input sequence in forward and reverse order.
It is used for implementing bidirectional RNNs like SeqBLSTM and SeqBGRU.
brnn = nn.BiSequencer(fwd, [bwd, merge])
The input
to the module is a sequence tensor of size seqlen x batchsize [x ...]
.
The output
is a sequence of size seqlen x batchsize [x ...]
.
BiSequencer
applies a fwd
RNN to each element in the sequence in
forward order and applies the bwd
RNN in reverse order (from last element to first element).
The fwd
and optional bwd
RNN can be AbstractRecurrent or AbstractSequencer
instances.
The bwd
rnn defaults to:
bwd = fwd:clone()
bwd:reset()
For each step (in the original sequence), the outputs of both RNNs are merged together using
the merge
module (defaults to nn.CAddTable
).
This way, the outputs of both RNNs (in forward order) are summed.
Internally, the BiSequencer
is implemented by decorating a structure of modules that makes
use of Sequencers
for the fwd
and bwd
modules.
Similarly to a Sequencer, the sequences in a batch must have the same size. But the sequence length of each batch can vary.
Note that when calling BiSequencer:remember()
, only the fwd
module can remember().
The bwd
module never remembers because it views the input
in reverse order.
Also note that BiSequencer:setZeroMask(zeroMask)
corrently reverses the order of the zeroMask
for the bwd
RNN.
blstm = nn.SeqBLSTM(inputsize, hiddensize, [outputsize])
A bi-directional RNN that uses SeqLSTM
. Internally contains a fwd
and bwd
SeqLSTM
.
Expects an input shape of seqlen x batchsize x inputsize
.
For merging the outputs of the fwd
and bwd
LSTMs, this BLSTM uses nn.CAddTable()
;
summing the outputs from eachoutput layer.
Example:
input = torch.rand(1, 2, 5)
blstm = nn.SeqBLSTM(5, 3)
print(blstm:forward(input))
Prints an output of a 1 x 2 x 3
tensor.
blstm = nn.SeqBGRU(inputsize, outputsize)
A bi-directional RNN that uses SeqGRU
. Internally contains a fwd
and bwd
SeqGRU
.
Expects an input shape of seqlen x batchsize x inputsize
.
For merging the outputs of the fwd
and bwd
LSTMs, this BLSTM uses nn.CAddTable()
;
summing the outputs from eachoutput layer.
This Module is a decorator similar to Sequencer.
It differs in that the sequence length is fixed before hand and the input is repeatedly forwarded
through the wrapped module
to produce an output table of length nStep
:
r = nn.Repeater(module, nStep)
Argument module
should be an AbstractRecurrent
instance.
This is useful for implementing models like RCNNs,
which are repeatedly presented with the same input.
References :
- A. Recurrent Models of Visual Attention
- B. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
This module can be used to implement the Recurrent Attention Model (RAM) presented in Ref. A :
ram = nn.RecurrentAttention(rnn, action, nStep, hiddenSize)
rnn
is an AbstractRecurrent instance.
Its input is {x, z}
where x
is the input to the ram
and z
is an
action sampled from the action
module.
The output size of the rnn
must be equal to hiddenSize
.
action
is a Module
that uses a REINFORCE module (ref. B) like
ReinforceNormal,
ReinforceCategorical, or
ReinforceBernoulli
to sample actions given the previous time-step's output of the rnn
.
During the first time-step, the action
module is fed with a Tensor of zeros of size input:size(1) x hiddenSize
.
It is important to understand that the sampled actions do not receive gradients
backpropagated from the training criterion.
Instead, a reward is broadcast from a Reward Criterion like VRClassReward Criterion to
the action
's REINFORCE module, which will backprogate graidents computed from the output
samples
and the reward
.
Therefore, the action
module's outputs are only used internally, within the RecurrentAttention module.
nStep
is the number of actions to sample, i.e. the number of elements in the output
table.
hiddenSize
is the output size of the rnn
. This variable is necessary
to generate the zero Tensor to sample an action for the first step (see above).
A complete implementation of Ref. A is available here.