Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuio: reader/writer tests do not verify file contents, only isomorphism. #6222

Open
cwharris opened this issue Sep 14, 2020 · 5 comments
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project

Comments

@cwharris
Copy link
Contributor

cwharris commented Sep 14, 2020

Describe the bug
Reader and writer tests have the potential to pass erroneously if the readers/writers change in a way that maintains isomorphism, regardless of whether the written contents contain the correct contents. Unlikely, but possible.

Expected behavior

  • Reader tests should explicitly test reading file contents and verifying the parsed data is as expected.
  • Writer tests should explicitly test writing data and verifying the file contents is as expected.

Examples:

auto filepath = temp_env->get_temp_filepath("SingleColumnWithWriter.csv");
write_csv_helper(filepath, input_table, false);
cudf_io::read_csv_args in_args{cudf_io::source_info{filepath}};
in_args.header = -1;
auto result = cudf_io::read_csv(in_args);
const auto result_table = result.tbl->view();
CUDF_TEST_EXPECT_TABLES_EQUIVALENT(input_table, result_table);

auto filepath = temp_env->get_temp_filepath("OrcSingleColumn.orc");
cudf_io::write_orc_args out_args{cudf_io::sink_info{filepath}, expected->view()};
cudf_io::write_orc(out_args);
cudf_io::read_orc_args in_args{cudf_io::source_info{filepath}};
in_args.use_index = false;
auto result = cudf_io::read_orc(in_args);
CUDF_TEST_EXPECT_TABLES_EQUAL(expected->view(), result.tbl->view());

auto filepath = temp_env->get_temp_filepath("SingleColumn.parquet");
cudf_io::parquet_writer_options out_opts =
cudf_io::parquet_writer_options::builder(cudf_io::sink_info{filepath}, expected->view());
cudf_io::write_parquet(out_opts);
cudf_io::parquet_reader_options in_opts =
cudf_io::parquet_reader_options::builder(cudf_io::source_info{filepath});
auto result = cudf_io::read_parquet(in_opts);
CUDF_TEST_EXPECT_TABLES_EQUAL(expected->view(), result.tbl->view());

@cwharris cwharris added bug Something isn't working Needs Triage Need team to review and classify cuIO cuIO issue tech debt labels Sep 14, 2020
@cwharris cwharris changed the title [BUG] cuio: csv reader/writer tests do not verify file contents, only isomorphism. [BUG] cuio: reader/writer tests do not verify file contents, only isomorphism. Sep 14, 2020
@kkraus14 kkraus14 added tests Unit testing for project libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Sep 15, 2020
@vuule
Copy link
Contributor

vuule commented Sep 16, 2020

Is this issue about the use of CUDF_TEST_EXPECT_TABLES_EQUIVALENT ?

@cwharris
Copy link
Contributor Author

cwharris commented Sep 16, 2020

No, this issue is about:

data == decode(encode(data)) // is tested, guarantees isomorphism
expected_file == encode(data) // is not tested, does not guarantee expected bytes are written to file
expected_data == decode(file) // is not tested, does not guarantee expected data is read from file

@vuule
Copy link
Contributor

vuule commented Sep 16, 2020

I don't think we have a way to test this in C++ tests.
We do have Python tests that leverage readers/writers from other libraries to validate the file content.
That said, how do resolve this issue? Migrate such tests to Python?

@cwharris
Copy link
Contributor Author

cwharris commented Sep 16, 2020

I don't think we have a way to test this in C++ tests.

Pretty sure that's accurate.

It's technically possible to test them in c++, but it's very inconvenient because we'd need sample files and associated expected data. It'll definitely be easier to test on the python layer. It's possible we account for these cases in python tests already, but imho we should check python's tests before closing this issue.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vuule vuule self-assigned this Jun 29, 2023
@vyasr vyasr removed the tech debt label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project
Projects
None yet
Development

No branches or pull requests

5 participants