We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from_generator
I'm building train, dev, and test using from_generator; however, in all three cases, the logger prints Generating train split: It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.py
Generating train split:
In [1]: from datasets import Dataset In [2]: def gen(): ...: yield {"pokemon": "bulbasaur", "type": "grass"} ...: In [3]: ds = Dataset.from_generator(gen) Generating train split: 1 examples [00:00, 133.89 examples/s]
It should be possible to specify any split name
datasets
huggingface_hub
fsspec
The text was updated successfully, but these errors were encountered:
Thanks for reporting, @pminervini.
I agree we should give the option to define the split name.
Indeed, there is a PR that addresses precisely this issue:
I am reviewing it.
Sorry, something went wrong.
Booom! thank you guys :)
Successfully merging a pull request may close this issue.
Describe the bug
I'm building train, dev, and test using
from_generator
; however, in all three cases, the logger printsGenerating train split:
It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.py
Steps to reproduce the bug
Expected behavior
It should be possible to specify any split name
Environment info
datasets
version: 2.19.2huggingface_hub
version: 0.23.3fsspec
version: 2023.10.0The text was updated successfully, but these errors were encountered: