Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix readme to call data/cached_fineweb10B.py 10 in docker #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ragulpr
Copy link

@ragulpr ragulpr commented Dec 20, 2024

As running outside docker is called with 10 files

python data/cached_fineweb10B.py 10 # downloads only the first 1.0B training tokens to save time

And logs seem to do so too
Training DataLoader: total number of tokens: 1000000000 across 10 files

Having lots of fun with this repo and the cifar speedrun! Great work, love the hackability of it! Trying to run some tests on my own and will PR tiny things if I see them.

@ragulpr ragulpr changed the title Call data/cached_fineweb10B.py 10 in docker Fix readme to call data/cached_fineweb10B.py 10 in docker Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants