Multilanguage-RolePlay-Datasets

Role Play Datasets in Multilanguages

Roleplaying is very important in the AI era. We have been role-playing as doctors, engineers, and other professionals since we were young. Now, as the technologies get more powerful, roleplaying has become an important part of our lives.

In a Large Language Model, roleplaying can bring empathy, which results in more engagement with the user. Thus, why roleplaying needs to be brought to the world of the Language model.

But for languages around the world, most are low resources and not supported very well by the open-source LLM, it is about fine-tuning the LLM to bring the technology into the local communities.

For roleplaying, datasets are very rare to fine-tune the language model. Thus I, Min Si Thu, create datasets for multiple languages to fine-tune for roleplay.

The base dataset is GPTeacher role play dataset by teknium 1, which can be found under this link, released under MIT License. The dataset is then translated into respective languages. The translation process is powered by Google Translate, using cloud translation API.

To the knowledge of my best, these datasets could be the very first role-play datasets for most of low resource languages listed below.

The following are the available languages dataset hyperlink, which can be found on huggingface collections.

Burmese (my)
Lao (lo)
Khmer (khm)
Malay (ms)
Vietnam (vi)
Thai (th)
Hindi (hi)
Indonesian (id)
Filipino (fil)
Bengali (bn)
Afrikaans (af)
Albanian (sq)
Amharic (am)
Georgian (ka)
Irish (ga)
Zulu (zu)
Serbian (sr)
Kinyarwanda (rw)
Somali (so)
Kurdish (ku)
Huasa (ha)
Icelandic (is)
Nepali (ne)
Panjabi/Punjabi (pa)
Tamil (ta)
Yiddish (yi)
Hebrew (he)
Azarbaijani (az)
Kazakh (kk)
Cebuano (ceb)
Turkish (tr)
Finnish (fin)
Czech (cs)
Norwegian (no)
Mongolian (mn)
Lithuanian (lt)

For more information, contact Min Si Thu.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
logo		logo
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilanguage-RolePlay-Datasets

About

Releases

Packages

License

MyanmarGPT-Movement/Multilanguage-RolePlay-Datasets

Folders and files

Latest commit

History

Repository files navigation

Multilanguage-RolePlay-Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages