-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TypeError: Couldn't cast array of type] Can only load a subset of the dataset #5596
Comments
Apparently some JSON objects have a EDIT: actually specifying the feature types doesn’t solve the issue, it raises an error because “labels” is missing in the data |
We've updated the dataset to remove the extra |
A similar error occurs in the Pile dataset (EleutherAI/the_pile) Loading the dataset produces the following error.
|
I think this was fixed in https://huggingface.co/datasets/EleutherAI/the_pile/discussions/11 |
i have the same problem ,how to solve : |
Describe the bug
I'm trying to load this dataset which consists of jsonl files and I get the following error:
But I can succesfully load a subset of the dataset, for example this works:
and
ds.features
returns:So I'm not sure if there's an issue with just some of the files. Grateful if you have any suggestions to fix the issue.
Side note:
I saw this related issue and tried to write a loading script to have
events
as aSequence
and notlist
here (the script was renamed). It worked with a subset locally but doesn't for the remote dataset it can't find https://huggingface.co/datasets/bigcode-data/the-stack-gh-issues/resolve/main/data.Steps to reproduce the bug
Expected behavior
Load the entire dataset succesfully.
Environment info
datasets
version: 2.10.1The text was updated successfully, but these errors were encountered: