add conv sequence to sequence model #686

Superjomn · 2018-03-06T06:54:43Z

fixes: #687

guoshengCS · 2018-03-13T12:22:10Z

fluid/conv_seq_to_seq/utils.py

+        self.atom = Atom(pd.fc, "fc", size=size, dropout=dropout)
+
+    def __call__(self, x):
+        # pd.fc take dims[1:] as projecton's input, need reshape to avoid that


The num_flatten_dims attribute of fc can be used to reshape the input shape, but it doesn't matter and just a reminder.

guoshengCS · 2018-03-14T04:03:12Z

fluid/conv_seq_to_seq/model.py

+
+        self.fc2 = Linear(embed_dim)
+
+    def forward(self, src_tokens, src_positions):


Maybe we can also add __call__ in ConvEncoder to unify the interface with other modules and layers. It is also the way to call modules in Pytorch and fairseq-py. Just personal preference and feel free about it.

guoshengCS · 2018-03-14T06:00:06Z

fluid/conv_seq_to_seq/model.py

+        self.fc2 = Linear(out_embed_dim)
+        self.fc3 = Linear(dict_size + 1, dropout=dropout)
+
+    def forward(self, prev_output_tokens, prev_positions, encoder_out):


Maybe we can also add __call__ in ConvDecoder to unify the interface with other modules and layers. It is also the way to call modules in Pytorch and fairseq-py. Just personal preference and feel free about it.

guoshengCS · 2018-03-14T06:30:26Z

fluid/conv_seq_to_seq/model.py

+                 pos_pad_id,
+                 dropout=0.1):
+        self.dropout = dropout
+        self.embed_tokens = Embedding(dict_size + 1, embed_dim, pad_id)


I guess dict_size + 1 is used to include the padding index, and I personally suggest make this special detail outside the module for a common module.

guoshengCS · 2018-03-14T07:25:18Z

fluid/conv_seq_to_seq/model.py

+                    in_channels,
+                    out_channels * 2,
+                    kernel_size,
+                    # padding=pad,


In the decoder, to guarantee no future information is available, left_pad, right_pad = [kernel_size - 1, 0] might be proper according to the Pytorch code and paper. But currently layers.conv only can pad the same sizes for left and right which Pytorch also suffers from (It seems that layers.sequence_conv support this padding schema but the python wrapper need to be enhanced). To achieve left_pad, right_pad = [kernel_size - 1, 0], we may need to use pad = kernel_size - 1 for conv followed by slice/split to remove future time steps created by padding, which is also fairseq-py uses. The following is the corresponding code but I can't find the code in the latest fairseq-py repo.

for i, (out_channels, kernel_size) in enumerate(convolutions): pad = kernel_size - 1 self.projections.append(Linear(in_channels, out_channels) if in_channels != out_channels else None) self.convolutions.append( LinearizedConv1d(in_channels, out_channels * 2, kernel_size, padding=pad, dropout=dropout)) self.attention.append(AttentionLayer(out_channels, embed_dim) if attention[i] else None) in_channels = out_channels

for proj, conv, attention in zip(self.projections, self.convolutions, self.attention): residual = x if proj is None else proj(x) x = F.dropout(x, p=self.dropout, training=self.training) x = conv(x) x = conv.remove_future_timesteps(x) x = F.glu(x)

def remove_future_timesteps(self, x): """Remove future time steps created by padding.""" if not self._is_incremental_eval and self.kernel_size[0] > 1 and self.padding[0] > 0: x = x[:-self.padding[0], :, :] return x

guoshengCS · 2018-03-14T08:03:21Z

fluid/conv_seq_to_seq/utils.py

+
+        x = self.atom(x)
+
+        x = Op.reshape(x, (B, T, -1))


Will it encounter the same problem as Transformer since both B and T may change. Maybe we can use (B, -1, C) to relax the sequence length, but I am not sure if it will effect the infershape of conv. Also the reshape_op is enhanced. PaddlePaddle/Paddle#9008

guoshengCS · 2018-03-14T08:48:39Z

fluid/conv_seq_to_seq/model.py

+                                         self.attention):
+            residual = x if proj is None else proj(x)
+            x = Op.dropout(x, self.dropout)
+            x = conv(x)


It seems that the wrapped Conv1D requests input with shape B x T x C, but the x = self._transpose_if_training(x) before seems to change the shape to T x B x C. I am a little confused about this.
I can see that Conv1D can work correctly in encoder since the input shape is B x T x C. While the shape of conv input here is T x B x C which may not be suitable to the wrapped Conv1D. Since the wrapped Conv1D contains a keep-length padding scheme, which can make the input and output have the same shape, it may run but I feel the computation might be incorrect.

Here list the link in fairseq-py for reference: the ConvTBC (convolution for encoder) and LinearizedConv1d(convolution for decoder) both requests input with shape T x B x C which differs with our here wrapped Conv1D.

guoshengCS · 2018-03-14T11:08:50Z

fluid/conv_seq_to_seq/model.py

+            x = fluid.nets.glu(x, dim=2)
+
+            if attention is not None:
+                x = self._transpose_if_training(x)


I think this is also used when the shape of input x is T x B x C and to convert the shape to B x T x C for attention calculations.
Maybe we can try to remove _transpose_if_training and _split_encoder_out included transpose and unify the shapes to B x T x C temporarily if we can confirm that our here wrapped Conv1D is not suitable to T x B x C.
I am not sure the reason why fairseq-py uses T x B x C shape for convolution, maybe it is faster according to the annotations.

guoshengCS · 2018-03-14T11:29:26Z

fluid/conv_seq_to_seq/utils.py

+        if sent[-1] != end_id:
+            sent.append(end_id)
+        if len(sent) < max_len:
+            sent += [pad_id for i in xrange(max_len - len(sent))]


Maybe we can pad the batch data according to the max length in current batch and allow variant max sequence length across batches, but this may also affect the model configurations such as the above x = Op.reshape(x, (B, T, -1))

guoshengCS · 2018-03-14T11:38:46Z

fluid/conv_seq_to_seq/model.py

+        encoder_a, encoder_b = encoder_out
+        # here just a trick
+        encoder_a = Op.transpose(encoder_a, [1, 2])
+        encoder_b = Op.transpose(encoder_b, [1, 2])


Is this because encoder_a, encoder_b = self._split_encoder_out(encoder_out) convert the shape from B x T x C to B x C x T.
We may can remove these two transpose and use x = pd.matmul(x, encoder_a, transpose_y=False) for dot product calculations.

lcy-seso · 2018-03-21T08:11:18Z

Because the organization of the directory has been changed. Can you move the codes from fluid into fluid/neural_machine_translation and let us merge this PR as soon as possible (once pass the code review).

CLAassistant · 2020-03-24T02:33:53Z

All committers have signed the CLA.

init

c7068c5

Superjomn requested a review from lcy-seso March 6, 2018 06:57

Superjomn added 2 commits March 6, 2018 15:00

style reformat

a52fbce

add cost print

df0deea

lcy-seso requested a review from guoshengCS March 6, 2018 07:05

Superjomn added 9 commits March 6, 2018 15:30

fix nan

f77f3e6

refactor Op.transpose

bc819cd

refactor

c8f35c6

apply paper's conv hyper-parameters

71e49b5

add train.py

22efbd9

add config

a224856

refactor data preparation

ea6c32b

restyle code

ee3b240

main use train config

573a511

guoshengCS reviewed Mar 14, 2018

View reviewed changes

move to fluid/neural_machine_translation/

50a48d8

luotao1 mentioned this pull request Apr 16, 2018

FLUID models roadmap PaddlePaddle/Paddle#8450

Closed

Superjomn closed this May 12, 2018

Superjomn reopened this May 14, 2018

Superjomn closed this Dec 28, 2022

paddle-bot bot added the status: not progressed label Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add conv sequence to sequence model #686

add conv sequence to sequence model #686

Superjomn commented Mar 6, 2018 •

edited

Loading

guoshengCS Mar 13, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

guoshengCS Mar 14, 2018

lcy-seso commented Mar 21, 2018

CLAassistant commented Mar 24, 2020 •

edited

Loading


		self.fc2 = Linear(embed_dim)

		def forward(self, src_tokens, src_positions):

add conv sequence to sequence model #686

add conv sequence to sequence model #686

Conversation

Superjomn commented Mar 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso commented Mar 21, 2018

CLAassistant commented Mar 24, 2020 • edited Loading

Superjomn commented Mar 6, 2018 •

edited

Loading

CLAassistant commented Mar 24, 2020 •

edited

Loading