Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread parse error #2708

Merged
merged 6 commits into from
May 25, 2021
Merged

fread parse error #2708

merged 6 commits into from
May 25, 2021

Conversation

pradkrish
Copy link
Collaborator

@pradkrish pradkrish commented Oct 25, 2020

Closes #2680

At src/core/read/fread/fread_thread_context.cc:210, If j<ncols and if *tch reaches the end of line with the current quote rule, the while loop exits ending up in IOError.

With the code added in this PR, one way to fix this is to decrement tch(tch--) to the previous sep and try parsing again with the next quote rule. If the number of columns in the current row is only one, decrementing tch will never hit the sep in the current row and end up running into the previous row, hence the if logic inside the while loop to prevent this.

@pradkrish pradkrish changed the title implement the logic that fixes this issue fread parse error Oct 25, 2020
@pradkrish
Copy link
Collaborator Author

pradkrish commented Oct 26, 2020

@st-pasha I am unable to see the log of the failing builds, will you be able give a snippet of one of them? Thanks.

@st-pasha
Copy link
Contributor

Both test suits segfaulted when reading the file https://h2o-public-test-data.s3.amazonaws.com/fread/bad.txt
Unfortunately, there are no core files, so we'll have to reproduce it locally.

@st-pasha
Copy link
Contributor

The test itself is tests/fread/test-fread-large.py::test_fread_all[/tmp/pydatatable_large_data/h2o-3/fread/bad.txt]. Notably, it's the first test in the "test_fread_all" sequence, so the problem may not be with the file per se (in the sense that any other file would have seg.faulted there too), just that "bad.txt" happened to be the first on the list.

@pradkrish
Copy link
Collaborator Author

I checked out this branch and ran it on my wife's macOS.

Versions
Python (venv): 3.7.9
Clang: 12.0.0

All tests passed and reading the file bad.txt gave a 408x3 table.

   | ####################  ##############                       ###############################################
--- + --------------------  -----------------------------------  -----------------------------------------------
  0 | #############         ##############                                                                     0
  1 | #############         ###########                                                                        0
  2 | #############         ##############                                                                     0
  3 | #############         #############                                                                      0
  4 | #############         #########                                                                          0
  5 | #############         ################                                                                   0
  6 | #############         ###############                                                                    0
  7 | #############         #################################                                                  0
  8 | #############         ########################                                                           0
  9 | #############         ####################                                                               0
 10 | #############         #############                                                                      0
 11 | #############         ############                                                                       0
 12 | #############         ###########                                                                        0
 13 | #############         ########                                                                           0
 14 | #############         ###                                                                                0
  … | …                     …                                                                                  …
403 | ###########           ##############                                                                     0
404 | ###########           #################                                                                  0
405 | ###########           ###############                                                                    0
406 | ###########           ###################################                                                0
407 | #########             ###############                                                                    0

any idea how to narrow down the bug? 🤔

@pradkrish
Copy link
Collaborator Author

@oleksiyskononenko You have told me before that you have a mac. My test suite is unfortunately failing on macOS in the cloud and I have no access to the logs either. As described above, I tested it locally on my wife's macbook and it passes. Whenever you have time, will you be able to build this branch and confirm whether the tests pass on your mac? I am kinda clueless how to proceed, any suggestions are useful. Thanks.

@oleksiyskononenko
Copy link
Contributor

@pradkrish I just checked it on my mac and it didn't segfault on bad.txt for python 3.6.8. I've also re-triggered the build for this PR to see if this issue is reproducible.

@pradkrish
Copy link
Collaborator Author

@pradkrish I just checked it on my mac and it didn't segfault on bad.txt for python 3.6.8. I've also re-triggered the build for this PR to see if this issue is reproducible.

Thanks @oleksiyskononenko, it seems to pass now. 😃

@pradkrish pradkrish requested a review from st-pasha November 4, 2020 09:24
@Darel13712
Copy link

Darel13712 commented May 21, 2021

Hey. It's been a year. Any plans to merge this? @st-pasha

@st-pasha
Copy link
Contributor

@Darel13712 Need to fix the tests still

@Darel13712
Copy link

@st-pasha seems all checks pass now?

@st-pasha st-pasha merged commit fe64f42 into main May 25, 2021
@st-pasha st-pasha deleted the ps/issue2680 branch May 25, 2021 19:12
@st-pasha st-pasha added this to the Release 1.0.0 milestone Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Parsing error with fread File containing a single unescaped " out-of-sample is read incorrectly
4 participants