-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloadable Training Data #3
Comments
Hi, I upload the link of data we generated here. Hopefully it would be helpful. |
Great, thanks a lot! |
Hey, quick heads up. The links in the table for the README are mis-matched. The pile links leads to packed data, and vice versa. Also a small type for the word "this" beforehand. Both should be easy to fix! :) |
Thank you for reminding me! |
When using a browser to download a dataset, the download process is often interrupted due to network problems. I tried to download it using aria2, but the download link was not recognized successfully. Are there any other ways to download the dataset? |
Hi @Andyyoung0507 I added direct download link in https://github.com/UT-Austin-RPL/GIGA/blob/main/README.md#pre-generated-data. You can try download the data with |
Thanks for all your help! |
It would be very useful to have the training data returned from
generate_data_parallel.py
script available to download, for both the pile and packed cases.I appreciate this may be a large amount of memory, and therefore difficult to host, so there is no expectation of course!
But it would avoid people needing to run the costly data generation process locally in order to experiment with the training.
The text was updated successfully, but these errors were encountered: