Skip to content

MinSiThu/Multilanguage-RolePlay-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilanguage-RolePlay-Datasets

Role Play Datasets in Multilanguages

Multilanguage, Multilingual Roleplay datasets

Roleplaying is very important in the AI era. We have been role-playing as doctors, engineers, and other professionals since we were young. Now, as the technologies get more powerful, roleplaying has become an important part of our lives.

In a Large Language Model, roleplaying can bring empathy, which results in more engagement with the user. Thus, why roleplaying needs to be brought to the world of the Language model.

But for languages around the world, most are low resources and not supported very well by the open-source LLM, it is about fine-tuning the LLM to bring the technology into the local communities.

For roleplaying, datasets are very rare to fine-tune the language model. Thus I, Min Si Thu, create datasets for multiple languages to fine-tune for roleplay.

The base dataset is GPTeacher role play dataset by teknium 1, which can be found under this link, released under MIT License. The dataset is then translated into respective languages. The translation process is powered by Google Translate, using cloud translation API.

To the knowledge of my best, these datasets could be the very first role-play datasets for most of low resource languages listed below.

The following are the available languages dataset hyperlink, which can be found on huggingface collections.

For more information, contact Min Si Thu.

About

Role Play Datasets in Multilanguages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published