Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问WizardLM的数据是爬取的GPT4吗? #60

Open
AceCHQ opened this issue Aug 29, 2023 · 3 comments
Open

请问WizardLM的数据是爬取的GPT4吗? #60

AceCHQ opened this issue Aug 29, 2023 · 3 comments

Comments

@AceCHQ
Copy link

AceCHQ commented Aug 29, 2023

Hello,感谢您的工作,请问WizardLM的进化指令翻译质量如何,有经过过滤吗?另外回答是爬取的GPT4还是GPT3.5吗?谢谢回复~

@LC1332
Copy link
Owner

LC1332 commented Aug 29, 2023

WizardLM有1万是用没改进的prompt翻译的,剩余5万多是好的。我打算之后用embedding筛除一下质量不好的。回答是爬取3.5的,4有点小贵~~

@AceCHQ
Copy link
Author

AceCHQ commented Sep 3, 2023

谢谢回复,请问embedding如何筛除?有什么合适的模型吗?

@LC1332
Copy link
Owner

LC1332 commented Sep 3, 2023

Good Question. 我们最近训了一个 https://huggingface.co/silk-road/luotuo-bert-en 我还剩一个实验是用这个 去对 luotuo-bert,把这些翻译数据集出现指令注入现象的错误翻译给修正一遍,你有兴趣的话 去我知乎主页https://www.zhihu.com/people/cheng-li-47 留个微信吧,我找相关的同学来推进一下QAQ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants