You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sft阶段我们也用了全部参数进行训练,我们也试过只对新增加的层训练,效果接近,用全参数训练是希望我们的方法能够兼容通用的训练pipeline;sft方面我认为可能对遗忘的影响没那么大,可以参考这篇文章:The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. 我们认为sft的时候可能只是激发预训练学习到的知识,所以其实影响不大。这也是为什么lora可以很好的做sft,但是不适合做预训练
您好,有几点请教下;
1.预训练使用lora,也是只训练lora新增加的参数。那和lora对比优势是什么呢?
2.这种方式预训练时,避免遗忘,增加领域数据时,还需要增加适当的通用数据混合吗?
3.sft阶段,是使用的全参训练吧,那sft阶段还是避免不了遗忘呢
The text was updated successfully, but these errors were encountered: