Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can starcoder2 be trained with a different language like TCL or lisp? #11

Open
cmosguy opened this issue Mar 5, 2024 · 1 comment
Open

Comments

@cmosguy
Copy link

cmosguy commented Mar 5, 2024

Hello @loubnabnl is it possible to get starcoder2 to learn TCL?

It was not part of the 30 languages so was curious if it's worth pursuing with SFT?

Also, is there FIM script you used for this version of starcoder2?

@loubnabnl
Copy link
Contributor

Hi, the 15B model was trained on 600+ programming languages including TCL, here's the full list of languages: https://huggingface.co/datasets/bigcode/the-stack-v2/blob/main/language_stats.csv

The 7B and 3B though were only trained on 17 languages available in the paper

For FIM it's similar to StarCoder, you can use this code with the right tokens (they're different from SantaCoder, we use underscores instead of dashes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants