You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rapidsai/cudf#10618 made a change to pandas and as a part of that discussion I realized that we are not testing
What happens if there are duplicate columns. I think Spark is going to return an error when doing schema discovery, but I am not sure what happens when we pass in a schema and include column names. This is even more interesting if we read a column that is a duplicate, or what happens if we ask for a column name that CUDF would generate from duplicate column names.
What happens if we have multiple files with different column names, or a different order to the columns. (schema evolution)
The text was updated successfully, but these errors were encountered:
I tested this manually and there we are doing the right thing. It is all based off of column position, for good or bad. This means there is no real way to change the order of columns/etc without playing some games. But it would be good to add some integration tests for this anyways.
revans2
added
test
Only impacts tests
and removed
task
Work required that improves the product but is not user facing
labels
May 12, 2022
sameerz
changed the title
[FEA] Test CSV scheam evolution
[FEA] Test CSV schema evolution
May 12, 2022
rapidsai/cudf#10618 made a change to pandas and as a part of that discussion I realized that we are not testing
The text was updated successfully, but these errors were encountered: