GitHub

lstm: comparing LSTM full gradients across multiple time steps among the following approaches:

theano gradient: theano.gradient.grad(E, w) (for calculating derivative of error E w.r.t. weight matrix w)
numerical gradient: (E(x+eps) - E(x-eps)) / (2 * eps)
Gradient given by Graves and Schmidhuber (2005): dE/dx * dx/dw = delta * y (which is incorrect, because dx/dw ≠ y, because w and y are not in a linear relation, because y depends on w through recurrence
New gradient calculation taking into consideration of the non-linearity between w and y

Currently only support gradients on the W matrices (weight for input x, three for the gates and one for the weighted input for the cell). Cmdline arg 0 -- 3 corresponding to the w in the input gate, the forget gate, the output gate, and the weighted input for the cell, respectively.

Dependencies: numpy and theano.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
lstm.py		lstm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

wangtong106/scribbles

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages