-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Parametrized index tests #20624
Conversation
Hello @WillAyd! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on April 23, 2018 at 15:49 Hours UTC |
pandas/tests/indexes/test_base.py
Outdated
|
||
def test_constructor_from_series_period(self): | ||
idx = pd.period_range('2015-01-01', freq='D', periods=3) | ||
if has_tz: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this could also leverage hasattr instead but I felt the explicitness of the parameter is more useful
pandas/tests/indexes/test_base.py
Outdated
tm.assert_index_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("klass", [pd.Series, pd.DataFrame]) | ||
def test_constructor_from_series_freq(self, klass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could arguably be split into two separate tests given the size of the conditional. Open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes woudl do that
pandas/tests/indexes/test_base.py
Outdated
@@ -237,61 +223,63 @@ def test_constructor_int_dtype_float(self, dtype): | |||
result = Index([0., 1., 2., 3.], dtype=dtype) | |||
tm.assert_index_equal(result, expected) | |||
|
|||
def test_constructor_int_dtype_nan(self): | |||
@pytest.mark.parametrize("dtype,klass_or_raises", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we feel about mixing types in klass_or_raises
- could also be split into a boolean flag for raises and the klass, though the test would typically just use one or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha just made this comment
pandas/tests/indexes/test_base.py
Outdated
|
||
@pytest.mark.parametrize("swap_objs", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of a strange parameter but just emulating the existing test - one assertion is done with the datetime at the initial position and another assertion is done with the timedelta at the initial position
# below should coerce | ||
[1., 2., 3.], np.array([1., 2., 3.], dtype=float) | ||
]) | ||
def test_constructor_dtypes_to_int64(self, vals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to a lot of the tests below and could arguably be built with parameters of "vals,dtype,klass" but I felt it was cleaner and less repetition to just break the tests by dtype
pandas/tests/indexes/test_base.py
Outdated
for idx in [Index(np.array([np.timedelta64(1, 'D'), np.timedelta64( | ||
1, 'D')])), Index([timedelta(1), timedelta(1)])]: | ||
assert isinstance(idx, TimedeltaIndex) | ||
@pytest.mark.parametrize("cast_idx", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used this in a few tests and am not a huge fan of it, but I wasn't sure of a better way to emulate the existing behavior where dtype=object
is in the Index
constructor in half of the tests
|
||
def test_constructor_empty(self): | ||
def test_constructor_empty_gen(self): | ||
skip_index_keys = ["repeats", "periodIndex", "rangeIndex", | ||
"tuples"] | ||
for key, idx in self.generate_index_types(skip_index_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general some of these magic class methods could probably be replaced with pytest functionality. I haven't looked in too much detail just yet but was planning on reviewing after giving the module a first pass at parametrization without changing class behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes see my comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note there is already a fixture of indices
which works, so have 3 methods of specifying things ATM:
indices
fixturegenerate_index_types(...)
self.indices
needs to clean this and just make fixtures that we can use generally (in conftest) with docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks good.
pandas/tests/indexes/test_base.py
Outdated
df['date'] = dts | ||
result = DatetimeIndex(df['date'], freq='MS') | ||
assert df['date'].dtype == object | ||
expected.name = 'date' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put some blank lines in between text like this (so its easily readable)
expected = pd.Index(array) | ||
result = pd.Index(ArrayLike(array)) | ||
tm.assert_index_equal(result, expected) | ||
expected = pd.Index(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you audit for whether we use pd.Index
(and similar) or Index
(we are not generally consistent), like to be consistent within in a module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep " Index(" -r pandas/tests --include="*.py" | wc -l
891
grep " pd.Index(" -r pandas/tests --include="*.py" | wc -l
264
May be a more robust way of doing it but assuming directionally accurate first is more widely used. Particular to this module I see numbers of 175 and 48, respectively.
I personally prefer pd.Index
for explicitness but am fine to change to Index
for module-consistency. lmk
pandas/tests/indexes/test_base.py
Outdated
@@ -237,61 +223,63 @@ def test_constructor_int_dtype_float(self, dtype): | |||
result = Index([0., 1., 2., 3.], dtype=dtype) | |||
tm.assert_index_equal(result, expected) | |||
|
|||
def test_constructor_int_dtype_nan(self): | |||
@pytest.mark.parametrize("dtype,klass_or_raises", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave these as 2 separate. generally having parameterize where one case raises is not great (you can name the same however, with _errors on one of the test names)
pandas/tests/indexes/test_base.py
Outdated
na_list = [na_val, na_val] | ||
exp = klass(na_list) | ||
assert exp.dtype == dtype | ||
tm.assert_index_equal(Index(na_list), exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to use
result =
pandas/tests/indexes/test_base.py
Outdated
tm.assert_index_equal(Index(np.array(na_list)), exp) | ||
|
||
@pytest.mark.parametrize("data", [ | ||
[pd.NaT, np.nan], [np.nan, pd.NaT], [np.nan, np.datetime64('nat')], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make this into a fixture (e.g. nulls_fixture) (if you want to add to conftest). note we should change globally (but can do that in another PR)
pandas/tests/indexes/test_base.py
Outdated
@@ -499,25 +483,25 @@ def test_insert(self): | |||
null_index = Index([]) | |||
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a')) | |||
|
|||
@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. you can use the nulls_fixture from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm sorry missed this in the last update. I'm assuming that the fixture should NOT include None as a null value so I can break that off into it's own test on the next push. Let me know if you do in fact want None
to be included there though (will require updates to the other function using the fixture)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should be including None
because I think we do convert it for all but object
dtypes (where it is left alone). so maybe need 2 fixtures. might be tricky to do this in a general way.
(0, Index(['b', 'c', 'd'], name='idx')), | ||
(-1, Index(['a', 'b', 'c'], name='idx')) | ||
]) | ||
def test_delete(self, pos, exp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that some of the tests here should really be parametreized on index types themselves, there is an open issue on that (eg. these set type ops, test_delete and such), are generally tested by the subclasses, but we need a more general cleanup on that
Codecov Report
@@ Coverage Diff @@
## master #20624 +/- ##
==========================================
- Coverage 91.85% 91.82% -0.03%
==========================================
Files 153 153
Lines 49310 49310
==========================================
- Hits 45292 45280 -12
- Misses 4018 4030 +12
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've deferred making any changes around the Index
vs pd.Index
conversation. I'm assuming that we can either do that as one cleanup at the end for the entire module or bundle into a separate change somewhere else but can also include incrementally in this PR if you'd prefer
exp = pd.DatetimeIndex([pd.NaT, pd.NaT]) | ||
assert exp.dtype == 'datetime64[ns]' | ||
|
||
for data in [[pd.NaT, np.nan], [np.nan, pd.NaT], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a mistake before with pd.NaT
to pair with np.nan
instead of the datetime constructor. With the new parametrized test that has been adjusted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't think so, this is a construction with mixed types of nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but judging off the original test right below it there is something slightly off. This one only uses np.datetime64('nat')
in 2/4 parameters, but the below test uses np.timedelta64('nat')
in 4/4 parameters.
FWIW it may not hurt to add more constructor tests here especially if we add None
to the nulls_fixture
. Will take a deeper look on next pass at this
pandas/tests/indexes/test_base.py
Outdated
@@ -499,25 +483,25 @@ def test_insert(self): | |||
null_index = Index([]) | |||
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a')) | |||
|
|||
@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm sorry missed this in the last update. I'm assuming that the fixture should NOT include None as a null value so I can break that off into it's own test on the next push. Let me know if you do in fact want None
to be included there though (will require updates to the other function using the fixture)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a messy file. prob need to do multiple PR's to get this really nice.
pandas/conftest.py
Outdated
@@ -87,3 +87,11 @@ def join_type(request): | |||
Fixture for trying all types of join operations | |||
""" | |||
return request.param | |||
|
|||
|
|||
@pytest.fixture(params=[numpy.nan, pandas.NaT]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need None
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use np and pd
pandas/tests/indexes/test_base.py
Outdated
tm.assert_index_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("klass", [pd.Series, pd.DataFrame]) | ||
def test_constructor_from_series_freq(self, klass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes woudl do that
pandas/tests/indexes/test_base.py
Outdated
assert result.tz == idx.tz | ||
@pytest.mark.parametrize("cast_as_obj", [True, False]) | ||
@pytest.mark.parametrize("idx,has_tz", [ | ||
(pd.date_range('2015-01-01 10:00', freq='D', periods=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prob drop the has_tz, then can make the idx a fixture above (in the test conditional you can directly test if isnstance DTI & tz is not None); then you can add a DTI w/o a tz here as well.
pandas/tests/indexes/test_base.py
Outdated
|
||
@pytest.mark.parametrize("pos", [0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests that are strictly for datetime like things can prob just move to a new file: test_datetimelike.py
(IOW tests that test all of DTI,TDI,PI but all together). can be future PR (or here).
IOW test_base should be only non-datetimelike tests.
pandas/tests/indexes/test_base.py
Outdated
for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']: | ||
idx = pd.date_range('2011-01-01', periods=5, tz=tz) | ||
dtype = idx.dtype | ||
@pytest.mark.parametrize("tz", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to pandas/tests/indexes/datetimes/test_timezones.py (might be a duplicate as well)
|
||
@pytest.mark.parametrize("attr", ['values', 'asi8']) | ||
@pytest.mark.parametrize("klass", [pd.Index, pd.TimedeltaIndex]) | ||
def test_constructor_dtypes_timedelta(self, attr, klass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to pandas/tests/indexes/timedeltas/test_construction (maybe duplicate)
|
||
def test_constructor_empty(self): | ||
def test_constructor_empty_gen(self): | ||
skip_index_keys = ["repeats", "periodIndex", "rangeIndex", | ||
"tuples"] | ||
for key, idx in self.generate_index_types(skip_index_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note there is already a fixture of indices
which works, so have 3 methods of specifying things ATM:
indices
fixturegenerate_index_types(...)
self.indices
needs to clean this and just make fixtures that we can use generally (in conftest) with docs
labels=[[], []]) | ||
assert isinstance(empty, MultiIndex) | ||
@pytest.mark.parametrize("empty,klass", [ | ||
(PeriodIndex([], freq='B'), PeriodIndex), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be duplicated elsewhere. am iffy on where to put things like this. maybe worth extracting things from this file and makign a test_construction
for things like this
pandas/tests/indexes/test_base.py
Outdated
@@ -499,25 +483,25 @@ def test_insert(self): | |||
null_index = Index([]) | |||
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a')) | |||
|
|||
@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should be including None
because I think we do convert it for all but object
dtypes (where it is left alone). so maybe need 2 fixtures. might be tricky to do this in a general way.
Most but not all edits made. I avoided anything that requires moving things out of the module, if only because my suggested plan of attack would be:
Open to suggestions on approach. |
pandas/conftest.py
Outdated
@@ -89,6 +89,14 @@ def join_type(request): | |||
return request.param | |||
|
|||
|
|||
@pytest.fixture(params=[None, np.nan, pd.NaT]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. Some more for the list: np.timedelta64('NaT'), np.datetime64('NaT'), float('nan'), np.float('NaN')
.
Note that np.float('NaN')
does not return np.nan
(i.e. does not behave like pd.NaT
), so this would catch any occurrences in pandas of e.g if foo is np.nan
@WillAyd can you rebase. can you expand the nulls this as well? to as much as you can while still having this pass? |
Added the suggestions from @jorisvandenbossche with the exception of The test is parametrized for a I think this is a corner case for this test so we could definitely still add to the fixture and have the test be responsible for skipping where appropriate, but for now I've just kept out of the fixture |
thanks @WillAyd if you can create an issue for extending the nulls fixture (and issue around that), and an issue about cleaning more testing things (e.g. comments above). |
I came across this module on another change and noticed that a lot of the tests could really use refactoring. There's a ton more to be done with this module but submitting as is so it doesn't get too large.
Can either add other commits on top of this or have this merged (assuming looks OK) and continue down the module in additional PR(s)