Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add empty test files for test reorganization #12288

Merged

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Dec 2, 2022

Description

This PR adds empty test modules that match the "Test Organization" guidelines outlined in the developer guide. Follow-up PRs will move existing tests into these test modules.

While I have attempted to match the structure of our API reference as much a possible, there are small differences. For example, the API reference lumps together Reshaping, Sorting, and Transposing, while I opted to include two different modules for reshaping and sorting.

There are only a couple of instances where I needed to deviate from the structure though.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@shwina shwina requested a review from a team as a code owner December 2, 2022 15:11
@shwina shwina requested review from vyasr and isVoid December 2, 2022 15:11
@github-actions github-actions bot added the Python Affects Python cuDF API. label Dec 2, 2022
@shwina shwina added the non-breaking Non-breaking change label Dec 2, 2022
@shwina
Copy link
Contributor Author

shwina commented Dec 2, 2022

For ease of reference, the file names are:

dataframe/test_constructing.py
dataframe/test_attributes.py
dataframe/test_conversion.py
dataframe/test_indexing.py
dataframe/test_binary_operations.py
dataframe/test_function_application.py
dataframe/test_computation.py
dataframe/test_reindexing.py
dataframe/test_selecting.py
dataframe/test_missing.py
dataframe/test_reshaping.py
dataframe/test_sorting.py
dataframe/test_combining.py
dataframe/test_timeseries.py
dataframe/test_io_serialization.py
general_functions/test_data_manipulation.py
general_functions/test_conversion.py
general_functions/test_datetimelike.py
general_utilities/test_testing.py
groupby/test_indexing.py
groupby/test_function_application.py
groupby/test_computation.py
groupby/test_stats.py
indexes/test_constructing.py
indexes/test_properties.py
indexes/test_modifying.py
indexes/test_computation.py
indexes/test_multiindex_compat.py
indexes/test_missing.py
indexes/test_memory_usage.py
indexes/test_conversion.py
indexes/test_sorting.py
indexes/test_time_specific.py
indexes/test_combining.py
indexes/test_selecting.py
indexes/test_numeric.py
indexes/test_categorical.py
indexes/test_modifying.py
indexes/test_interval.py
indexes/multiindex/test_constructing.py
indexes/multiindex/test_properties.py
indexes/multiindex/test_selecting.py
indexes/datetime/test_constructing.py
indexes/datetime/test_components.py
indexes/datetime/test_time_specific.py
indexes/datetime/test_conversion.py
indexes/timedelta/test_constructing.py
indexes/timedelta/test_components.py
indexes/timedelta/test_conversion.py
io/test_csv.py
io/test_text.py
io/test_json.py
io/test_parquet.py
io/test_orc.py
io/test_hdf5.py
io/test_feather.py
io/test_avro.py
lists/test_list_methods.py
options/test_options.py
series/test_constructing.py
series/test_attributes.py
series/test_conversion.py
series/test_indexing.py
series/test_binary_operations.py
series/test_function_application.py
series/test_computation.py
series/test_selecting.py
series/test_missing.py
series/test_reshaping.py
series/test_sorting.py
series/test_combining.py
series/test_timeseries.py
series/test_accessors.py
series/test_datetimelike.py
series/test_categorial.py
series/test_io_serialization.py
strings/test_string_methods.py
structs/test_struct_methods.py
text/test_subword_tokenizer.py
window/test_rolling.py

@shwina shwina closed this Dec 2, 2022
@shwina shwina reopened this Dec 2, 2022
@shwina shwina added the improvement Improvement / enhancement to an existing function label Dec 2, 2022
@shwina
Copy link
Contributor Author

shwina commented Dec 5, 2022

In addition to organizing tests to align with the layout of our API reference, we currently recommend in our developer guide that:

In cases where tests may be shared by multiple classes sharing a common parent
(e.g. DataFrame and Series both require IndexedFrame tests),
the tests may be placed in a directory corresponding to the parent class.

Thus, for example, rather than placing tests for Series.first in series/test_selecting.py, and tests for DataFrame.first in dataframe/test_selecting.py, we recommend placing both in indexed_frame/test_selecting.py.

This way of organizing tests has both pros and cons, and I believe the cons outweigh the pros. I'm curious what others think.

  • 🟢 Pro: The location of the tests for a method matches the location in the class hierarchy that method is implemented in. (this is also a con, see below).

  • 🟢 Pro: Because all tests for .first are in the same module, any specific testing utilities needed by the test can be included in the same module. There's no need for duplication or ambiguity as to where those utilities should live.

  • 🔴 Con: Our class hierarchy is an implementation detail. If a new class is added into the hierarchy, tests may need to be relocated. Also, if a method newly needs to be specialized in a subclass, tests for that method need to be relocated to the directory corresponding to that subclass. Not only does this add churn, it also becomes easy for things to go out of sync.

  • 🔴 Con: It's not easy to predict where the tests for a method live without knowledge of where the method is implemented in the source: are the tests for DataFrame.first in frame/test_selecting.py, indexed_frame/test_selecting.py, or dataframe/test_selecting.py?

@brandon-b-miller
Copy link
Contributor

I think major changes to the hierarchy of our internal classes should be uncommon enough to make moving tests a lot of overhead, although I do think we should certainly do so if such changes are ever made. And while such things are implementation details, I think if one is interacting with our tests directly it's more likely that such details might be relevant anyways.

@vyasr
Copy link
Contributor

vyasr commented Dec 5, 2022

I'm less concerned about the implementation detail question since as @brandon-b-miller points out that is reasonably something that test writers should be aware of and I don't foresee it changing frequently enough that churn will be a concern. I do agree with the discoverability issue, though. I would be open to alternative solutions that do not involve additional code duplication (aside from importing helper functions, which I think is fine). There is also a reasonable question raised by @wence- about how strong the relationship between a Series and a DataFrame really is. Maybe it's more coincidental than by design that any particular test written for a DataFrame can actually be reused for a Series and vice versa since the two have meaningfully different semantics in lots of cases. That question may get easier to answer as we start consolidating tests though, so perhaps we just avoid the parent class files altogether for now and once we've made more progress see if consolidating further into those files makes sense?

@shwina
Copy link
Contributor Author

shwina commented Dec 13, 2022

rerun tests

@codecov
Copy link

codecov bot commented Jan 5, 2023

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.06@f328b64). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head e196867 differs from pull request most recent head d77a58a. Consider uploading reports for the commit d77a58a to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.06   #12288   +/-   ##
===============================================
  Coverage                ?   85.72%           
===============================================
  Files                   ?      155           
  Lines                   ?    24911           
  Branches                ?        0           
===============================================
  Hits                    ?    21356           
  Misses                  ?     3555           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vyasr
Copy link
Contributor

vyasr commented Jan 5, 2023

@shwina what's the status on this PR? Should we take a look at it, or is it not quite ready yet, Also, how do we want to tackle next steps? I think we should wait to get everything reorganized to our liking before closing #4730, right?

@shwina shwina changed the base branch from branch-23.02 to branch-23.04 February 9, 2023 20:23
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one question, I see that there are data subpackages for each file format (avro, orc, etc). Is that necessary? I could see value in subdirectories if you want to store data in those, but why do they need __init__.py files? Other than that, this org looks good to me.

@shwina shwina changed the base branch from branch-23.04 to branch-23.06 April 6, 2023 15:39
@shwina
Copy link
Contributor Author

shwina commented Apr 6, 2023

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants