why does LSTM can be discarded during inference? #43

MXuer · 2023-05-12T09:17:57Z

I am confused about this sentence in your papar of "GPT Understands, Too":

Moreover, in the inference, we only need the output embedding h and can discard the LSTM head.

If the LSTM encoder was used during training, and the finally embeddings was combined by the outputs of LSTM encoder and the original embeddings, while it was discarded duraing inference, the finally embeddings was just the outputs of two embedding layers. Does this make different performance?

So why LSTM can be discarded in the inference?

Thanks a lot.

Deerkangkang · 2023-05-19T07:37:13Z

I am not English speaker, and i would using Chinese to answer this question. 在预测阶段，模板部分的输出是不变的，因为输入Encode的模板保持不变，所以LSTM的输出也不会改变。只需要拿到第一次LSTM的输出就可以在整个预测阶段使用了， I hope this answer can help you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why does LSTM can be discarded during inference? #43

why does LSTM can be discarded during inference? #43

MXuer commented May 12, 2023

Deerkangkang commented May 19, 2023 •

edited

Loading

why does LSTM can be discarded during inference? #43

why does LSTM can be discarded during inference? #43

Comments

MXuer commented May 12, 2023

Deerkangkang commented May 19, 2023 • edited Loading

Deerkangkang commented May 19, 2023 •

edited

Loading