Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when downloading cloudops datasets #77

Open
liu-jc opened this issue Jun 24, 2024 · 0 comments
Open

Bug when downloading cloudops datasets #77

liu-jc opened this issue Jun 24, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@liu-jc
Copy link
Contributor

liu-jc commented Jun 24, 2024

Describe the bug
With the current datasets version, it cannot download cloudops datasets.
To Reproduce

from datasets import load_dataset
dataset = load_dataset('Salesforce/cloudops_tsf', 'azure_vm_traces_2017')

Expected behavior
It should successfully download the datasets.

Error message or code output
Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.

Traceback (most recent call last):
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/builder.py", line 1973, in _prepare_split_single
    for _, table in generator:
  File "/root/.cache/huggingface/modules/datasets_modules/datasets/Salesforce--cloudops_tsf/c256e0ff4b38ace660f9c190f7ea36b6f11580926404e453a4b059ab54ae6b24/cloudops_tsf.py", line 251, in _generate_tables
    table = pq.read_table(filepath)
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/streaming.py", line 75, in wrapper
    return function(*args, download_config=download_config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 812, in xpyarrow_parquet_read_table
    return pq.read_table(xopen(filepath_or_buffer, mode="rb", download_config=download_config), **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/export/home/uni2ts/venv-new-hf/lib/python3.11/site-packages/datasets/download/streaming_download_manager.py", line 507, in xopen
    return open(main_hop, mode, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/root/.cache/huggingface/datasets/downloads/extracted/44a68b01b5facec6049e9d866260fdea631f258f755b50cff7b40c2f31f65ec1'

Current workaround
Change the versions of a few dependencies.

datasets==2.12.0
fsspec==2023.5.0

Proposed solution
It probably would be better if we directly change the dataset format in huggingface, then keep the same requirements for our current dependencies.

cc: @chenghaoliu89 @gorold

@liu-jc liu-jc added the bug Something isn't working label Jun 24, 2024
@gorold gorold removed their assignment Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants