We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
千呼万唤始出来,简单看过报告后,我认为这实在是一份振奋人心且令人印象深刻的工作,说R1才是真正的open和AI也不为过。但我有一个很关键的问题就是舍弃指令微调直接进行强化学习这一技术路线在较小的模型上可以奏效吗,毕竟训练一个600B的模型对于大部分人来说实在是太困难了。
The text was updated successfully, but these errors were encountered:
It seems that Table 6 in the paper could answer your question.
Sorry, something went wrong.
也没有舍弃指令微调把? 2.3.1的cold start就是指令微调啊。
估计小模型确实无法奏效,小模型的泛化能力和模型容量都不足,这点在pretrain阶段也是一样的。
现阶段还是要靠大模型探索泛化,小模型蒸馏。
No branches or pull requests
千呼万唤始出来,简单看过报告后,我认为这实在是一份振奋人心且令人印象深刻的工作,说R1才是真正的open和AI也不为过。但我有一个很关键的问题就是舍弃指令微调直接进行强化学习这一技术路线在较小的模型上可以奏效吗,毕竟训练一个600B的模型对于大部分人来说实在是太困难了。
The text was updated successfully, but these errors were encountered: