ntk_rope_mixed_init 中old_init是否可以简化，省略inv_freq、_set_cos_sin_cache()步骤 #5

samantha0-ops · 2023-08-15T13:18:01Z

作者您好～
在ntk_rope_mixed_init的实现中，首先计算了 old_init(self, dim, max_position_embeddings, base, device)，但在我理解中old_init除了定义下面的值外，
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base

计算inv_freq，self.register_buffer("inv_freq", inv_freq, persistent=False)，self._set_cos_sin_cache()这几步都会在ntk_rope_mixed_init接下来的计算中被替换掉，因此是否可以将old_init简化为仅下式呢？这样可以节省很多的显存
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base

The text was updated successfully, but these errors were encountered:

bojone · 2023-08-15T15:02:50Z

看起来应该可以。但你说“可以节省很多的显存”我不大理解，难道这里背后有什么我不知道的细节吗？

samantha0-ops · 2023-08-16T03:23:29Z

看起来应该可以。但你说“可以节省很多的显存”我不大理解，难道这里背后有什么我不知道的细节吗？

我的表诉可能不太严谨，用“可以节省部分显存”比较合适。
我们正在llama-13B、A100卡上进行rope-mixed-logn方法的微调，当按您的代码执行old_init操作时会报错出现显存不足，但换成我上述的代码就不会出现显存不足的情况了，所以我想确认一下这种操作是否可行，会不会存在某些我不了解的bug。

af-74413592 · 2023-08-17T04:17:19Z

看起来应该可以。但你说“可以节省很多的显存”我不大理解，难道这里背后有什么我不知道的细节吗？

我的表诉可能不太严谨，用“可以节省部分显存”比较合适。我们正在llama-13B、A100卡上进行rope-mixed-logn方法的微调，当按您的代码执行old_init操作时会报错出现显存不足，但换成我上述的代码就不会出现显存不足的情况了，所以我想确认一下这种操作是否可行，会不会存在某些我不了解的bug。

显存不足时正常的，删掉点文本长度试试。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ntk_rope_mixed_init 中old_init是否可以简化，省略inv_freq、_set_cos_sin_cache()步骤 #5

ntk_rope_mixed_init 中old_init是否可以简化，省略inv_freq、_set_cos_sin_cache()步骤 #5

samantha0-ops commented Aug 15, 2023 •

edited

Loading

bojone commented Aug 15, 2023

samantha0-ops commented Aug 16, 2023

af-74413592 commented Aug 17, 2023

ntk_rope_mixed_init 中old_init是否可以简化，省略inv_freq、_set_cos_sin_cache()步骤 #5

ntk_rope_mixed_init 中old_init是否可以简化，省略inv_freq、_set_cos_sin_cache()步骤 #5

Comments

samantha0-ops commented Aug 15, 2023 • edited Loading

bojone commented Aug 15, 2023

samantha0-ops commented Aug 16, 2023

af-74413592 commented Aug 17, 2023

samantha0-ops commented Aug 15, 2023 •

edited

Loading