Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CSV escaped " in quoted strings produce incorrect value #129

Open
revans2 opened this issue Jun 9, 2020 · 2 comments
Open

[BUG] CSV escaped " in quoted strings produce incorrect value #129

revans2 opened this issue Jun 9, 2020 · 2 comments
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P1 Nice to have for release SQL part of the SQL/Dataframe plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Jun 9, 2020

Describe the bug
If I have the following line in a CSV file, that just has a single string in it.

"TEST\"MORE"

The CPU and GPU produce different results, and the GPU version looks totally wrong, because it just copied the data as is, and didn't try to parse it.

cpu = 'TEST"MORE'
gpu = '"TEST\\"MORE"\n'

To reproduce this add the line to integration_tests/src/test/resources/str.csv and rerun the integration tests. csv_test.py::test_basic_read should start to fail.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin labels Jun 9, 2020
@sameerz sameerz added P1 Nice to have for release and removed ? - Needs Triage Need team to review and classify labels Aug 18, 2020
@revans2 revans2 mentioned this issue Apr 1, 2021
38 tasks
@rwlee
Copy link
Contributor

rwlee commented May 3, 2023

For clarity, this is triaged so that CSV reads fall back to the CPU for any non-\ escape character

@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Aug 16, 2023
@revans2
Copy link
Collaborator Author

revans2 commented Aug 16, 2023

CUDF does not support escape characters in CSV yet.

rapidsai/cudf#11984 is to add that in as an option.

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Signed-off-by: spark-rapids automation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P1 Nice to have for release SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

No branches or pull requests

3 participants