We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好~ 在ntk_rope_mixed_init的实现中,首先计算了 old_init(self, dim, max_position_embeddings, base, device),但在我理解中old_init除了定义下面的值外, super().__init__() self.dim = dim self.max_position_embeddings = max_position_embeddings self.base = base
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base
计算inv_freq,self.register_buffer("inv_freq", inv_freq, persistent=False),self._set_cos_sin_cache()这几步都会在ntk_rope_mixed_init接下来的计算中被替换掉,因此是否可以将old_init简化为仅下式呢?这样可以节省很多的显存 super().__init__() self.dim = dim self.max_position_embeddings = max_position_embeddings self.base = base
The text was updated successfully, but these errors were encountered:
看起来应该可以。但你说“可以节省很多的显存”我不大理解,难道这里背后有什么我不知道的细节吗?
Sorry, something went wrong.
我的表诉可能不太严谨,用“可以节省部分显存”比较合适。 我们正在llama-13B、A100卡上进行rope-mixed-logn方法的微调,当按您的代码执行old_init操作时会报错出现显存不足,但换成我上述的代码就不会出现显存不足的情况了,所以我想确认一下这种操作是否可行,会不会存在某些我不了解的bug。
看起来应该可以。但你说“可以节省很多的显存”我不大理解,难道这里背后有什么我不知道的细节吗? 我的表诉可能不太严谨,用“可以节省部分显存”比较合适。 我们正在llama-13B、A100卡上进行rope-mixed-logn方法的微调,当按您的代码执行old_init操作时会报错出现显存不足,但换成我上述的代码就不会出现显存不足的情况了,所以我想确认一下这种操作是否可行,会不会存在某些我不了解的bug。
显存不足时正常的,删掉点文本长度试试。
No branches or pull requests
作者您好~
在ntk_rope_mixed_init的实现中,首先计算了 old_init(self, dim, max_position_embeddings, base, device),但在我理解中old_init除了定义下面的值外,
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base
计算inv_freq,self.register_buffer("inv_freq", inv_freq, persistent=False),self._set_cos_sin_cache()这几步都会在ntk_rope_mixed_init接下来的计算中被替换掉,因此是否可以将old_init简化为仅下式呢?这样可以节省很多的显存
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.base = base
The text was updated successfully, but these errors were encountered: