-
Pytorch and Chainer are in great resemblance, from architecture to API grammer.
-
x_cuda = x.cuda(device_id=0)
returns a new tensor on gpu0, meanwhile x is kept as it was; i.e., this is copy op. After this, change x has no effect on x_cuda and vice versa. Use.cpu()
to copy the tensor from GPU to -
Pytorch variable has no .zero_grad() function as Chainer, use
.data.zero_()
to do the job (In Pytorch, only nn.Module class has .zero_grad() attribute) -
Pytorch use Module.train() & Model.eval() calls with empty param to switch between training mode and evaluation mode
-
torch.Tensor() is just an alias of torch.FloatTensor(), not what I expected as an universal constructer which would determine the dtype according to input ndarray. torch.from_numpy() does this job meanwhile.
-
For RNN with multiple layers and dropout enabled, the pytorch implementation does not apply dropout for the last layer.
-
The difference between
LSTM
andLSTMCell
lies in the input shape: input ofLSTM
is 3D (B, T, D) whereas input ofLSTMCell
is 2D (B, D), i.e.,LSTMCell
is used for just one time step. -
The
LSTM
implementation of Pytorch has two problems: 1) no peepholes 2) two biases ini_t, f_t, g_t, o_t
, which is nonsense (according to Facebook's comment, it's from CuDNN's convention) -
Pytorch does not support numpy-style broadcasting, so to do element-wise multiplication, for example
X
(3, 50) andy
(50), you need do.unsqueeze
and then.expand
:X * y.unsqueeze(0).expand_as(X)
-
Even with
batch_first=True
, the hiddens returned by LSTM(GRU, etc) are still of size(num_layers * num_directions, batch, hidden_size)
-
The
CNN
implementation of Pytorch does not do filter flipping by default; and its speed is comparable to Theano'sCNN
, produces exactly the same result, meanwhileconvolve2d()
of scipy is about 2 * times slower, and result slightly different (within 10eps
); source code of Pytorch's convolution resides inpytorch/torch/csrc/autograd/functions/convolution.cpp
-
Pytorch does not do weight initialization automatically, you have to define a
reset_parameters()
function yourself, and call it at model initialization. -
Tensor.numpy()
return a numpy array sharing the memory. In another word, it just returns the memory pointer. So we can use this mechanism to MODIFY the value of a tensor, though not intuitive. -
To set a model/module to switch between train/predict mode, call
nn.Module.train(True/False)
. This function will recursively set every child module's mode. NEVER useself.training=True/False
, this only applies to the current module without effecting its children.