Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ngrams from coding-corpus #61

Open
Glitchy-Tozier opened this issue Jun 10, 2023 · 1 comment
Open

Add ngrams from coding-corpus #61

Glitchy-Tozier opened this issue Jun 10, 2023 · 1 comment

Comments

@Glitchy-Tozier
Copy link
Contributor

Not sure where to find a corpus, but I think having access to ngrams of a mix of various programming-languages would be pretty nice.

@Glitchy-Tozier Glitchy-Tozier changed the title Add Programming Ngrams Add ngrams from coding-corpus Jul 9, 2023
@fohrloop
Copy link

fohrloop commented Oct 22, 2024

For anyone interested, I created a code corpus called granite-code-ngrams. It contains following ngrams:

  • python
  • rust
  • javascript
  • typescript
  • css (incl. scss & less)

and a mixture (40% Python, 10% Rust, 20% JavaScript, 20% TypeScript, 10% CSS) called "code". I licensed it under MIT so you could also include it in this repo if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants