Columns with missing data are filled with stale data using CSV.Rows #903

Mrbigmatt · 2021-09-15T02:57:56Z

When importing the attached file with CSV.Rows and iterating through the rows, the column with missing data is filled with the previous row's data. CSV.File seems to import this file correctly. This is with CSV.jl 0.9.2.

test.csv

Fixes #903. The issue here is a consequence of the big internal refactoring that happened for the 0.9 release. Part of the refactoring is that `parserow` doesn't explicitly set a column's value to `missing` if a sentinel value is detected while parsing. This is because when a column is allocated, it's allocated with "all missing" values set, so it _should_ already be set to missing. This doesn't hold up, however, for `CSV.Rows`, where we only allocate a single element "column" vector for each column and reuse it when iterating rows. The solution, therefore, is to "reset" the column vectors before each call to `parserow` to ensure the column values are set to missing in case a sentinel value is detected while parsing.

quinnj · 2021-09-15T05:31:01Z

Thanks for reporting! Unfortunate bug from a recent refactoring; fix is up: #904.

Fixes #903. The issue here is a consequence of the big internal refactoring that happened for the 0.9 release. Part of the refactoring is that `parserow` doesn't explicitly set a column's value to `missing` if a sentinel value is detected while parsing. This is because when a column is allocated, it's allocated with "all missing" values set, so it _should_ already be set to missing. This doesn't hold up, however, for `CSV.Rows`, where we only allocate a single element "column" vector for each column and reuse it when iterating rows. The solution, therefore, is to "reset" the column vectors before each call to `parserow` to ensure the column values are set to missing in case a sentinel value is detected while parsing.

quinnj mentioned this issue Sep 15, 2021

Fix CSV.Rows when missing values are parsed #904

Merged

quinnj mentioned this issue Sep 15, 2021

Difference between File and Rows #893

Closed

quinnj closed this as completed in #904 Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Columns with missing data are filled with stale data using CSV.Rows #903

Columns with missing data are filled with stale data using CSV.Rows #903

Mrbigmatt commented Sep 15, 2021 •

edited

Loading

quinnj commented Sep 15, 2021

Columns with missing data are filled with stale data using CSV.Rows #903

Columns with missing data are filled with stale data using CSV.Rows #903

Comments

Mrbigmatt commented Sep 15, 2021 • edited Loading

quinnj commented Sep 15, 2021

Mrbigmatt commented Sep 15, 2021 •

edited

Loading