Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_generator does not allow to specify the split name #7033

Closed
pminervini opened this issue Jul 9, 2024 · 2 comments · Fixed by #7015
Closed

from_generator does not allow to specify the split name #7033

pminervini opened this issue Jul 9, 2024 · 2 comments · Fixed by #7015

Comments

@pminervini
Copy link
Contributor

Describe the bug

I'm building train, dev, and test using from_generator; however, in all three cases, the logger prints Generating train split:
It's not possible to change the split name since it seems to be hardcoded: https://github.com/huggingface/datasets/blob/main/src/datasets/packaged_modules/generator/generator.py

Steps to reproduce the bug

In [1]: from datasets import Dataset

In [2]: def gen():
   ...:     yield {"pokemon": "bulbasaur", "type": "grass"}
   ...: 

In [3]: ds = Dataset.from_generator(gen)
Generating train split: 1 examples [00:00, 133.89 examples/s]

Expected behavior

It should be possible to specify any split name

Environment info

  • datasets version: 2.19.2
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.8.5
  • huggingface_hub version: 0.23.3
  • PyArrow version: 15.0.0
  • Pandas version: 2.0.3
  • fsspec version: 2023.10.0
@albertvillanova
Copy link
Member

Thanks for reporting, @pminervini.

I agree we should give the option to define the split name.

Indeed, there is a PR that addresses precisely this issue:

I am reviewing it.

@albertvillanova albertvillanova linked a pull request Jul 9, 2024 that will close this issue
@pminervini
Copy link
Contributor Author

Booom! thank you guys :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants