Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Test CSV schema evolution #5475

Open
revans2 opened this issue May 12, 2022 · 1 comment
Open

[FEA] Test CSV schema evolution #5475

revans2 opened this issue May 12, 2022 · 1 comment
Labels
good first issue Good for newcomers test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented May 12, 2022

rapidsai/cudf#10618 made a change to pandas and as a part of that discussion I realized that we are not testing

  • What happens if there are duplicate columns. I think Spark is going to return an error when doing schema discovery, but I am not sure what happens when we pass in a schema and include column names. This is even more interesting if we read a column that is a duplicate, or what happens if we ask for a column name that CUDF would generate from duplicate column names.
  • What happens if we have multiple files with different column names, or a different order to the columns. (schema evolution)
@revans2 revans2 added ? - Needs Triage Need team to review and classify task Work required that improves the product but is not user facing labels May 12, 2022
@revans2
Copy link
Collaborator Author

revans2 commented May 12, 2022

I tested this manually and there we are doing the right thing. It is all based off of column position, for good or bad. This means there is no real way to change the order of columns/etc without playing some games. But it would be good to add some integration tests for this anyways.

@revans2 revans2 added test Only impacts tests and removed task Work required that improves the product but is not user facing labels May 12, 2022
@sameerz sameerz changed the title [FEA] Test CSV scheam evolution [FEA] Test CSV schema evolution May 12, 2022
@sameerz sameerz added good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants