Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuDF.read_json fails with cudaErrorInvalidValue invalid argument #17068

Closed
ayushdg opened this issue Oct 11, 2024 · 1 comment · Fixed by #17161
Closed

[BUG] cuDF.read_json fails with cudaErrorInvalidValue invalid argument #17068

ayushdg opened this issue Oct 11, 2024 · 1 comment · Fixed by #17161
Labels
bug Something isn't working

Comments

@ayushdg
Copy link
Member

ayushdg commented Oct 11, 2024

Describe the bug
cudf.read_json fails on a specific file in my dataset

Steps/Code to reproduce bug

import cudf

cudf.read_json("/path/to/file.json.gz", lines=True)

RuntimeError: CUDA error encountered at: /__w/cudf/cudf/cpp/src/io/json/read_json.cu:318: 1 cudaErrorInvalidValue invalid argument

Expected behavior

import pandas as pd
pd.read_json("/path/to/file.json.gz", lines=True) # works

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: Conda
    • If method of install is [Docker], provide docker pull & docker run commands used

Environment details
cudf 24.08, 24.12 (nightly) [ haven't checked with 24.10 but given 08, and 12 both fail I suspect the issue applies)

Additional context
Data here: 2022-33_1303_en_all.json.gz

@ayushdg ayushdg added the bug Something isn't working label Oct 11, 2024
@shrshi
Copy link
Contributor

shrshi commented Oct 17, 2024

On further investigation, this bug occurs due to an under-estimate in the size of the device buffer required to store the uncompressed data.
Proposed solution: (i) Get estimate of uncompressed buffer size (fallback to heuristic if computing such an estimate is expensive) and (ii) Use realloc-and-retry logic from #16687 if the estimate falls short. We can extend this logic to multi-source compressed inputs as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants