Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R1的成功经验是否能够直接迁移到小模型上而不是通过蒸馏 #3

Open
ghntd opened this issue Jan 20, 2025 · 2 comments

Comments

@ghntd
Copy link

ghntd commented Jan 20, 2025

千呼万唤始出来,简单看过报告后,我认为这实在是一份振奋人心且令人印象深刻的工作,说R1才是真正的open和AI也不为过。但我有一个很关键的问题就是舍弃指令微调直接进行强化学习这一技术路线在较小的模型上可以奏效吗,毕竟训练一个600B的模型对于大部分人来说实在是太困难了。

@Mizersy
Copy link

Mizersy commented Jan 21, 2025

It seems that Table 6 in the paper could answer your question.

@MaoXinn
Copy link

MaoXinn commented Jan 21, 2025

也没有舍弃指令微调把? 2.3.1的cold start就是指令微调啊。

估计小模型确实无法奏效,小模型的泛化能力和模型容量都不足,这点在pretrain阶段也是一样的。

现阶段还是要靠大模型探索泛化,小模型蒸馏。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants