Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Parametrized index tests #20624

Merged
merged 13 commits into from
Apr 24, 2018
Merged

TST: Parametrized index tests #20624

merged 13 commits into from
Apr 24, 2018

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Apr 6, 2018

I came across this module on another change and noticed that a lot of the tests could really use refactoring. There's a ton more to be done with this module but submitting as is so it doesn't get too large.

Can either add other commits on top of this or have this merged (assuming looks OK) and continue down the module in additional PR(s)

@pep8speaks
Copy link

pep8speaks commented Apr 6, 2018

Hello @WillAyd! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 23, 2018 at 15:49 Hours UTC


def test_constructor_from_series_period(self):
idx = pd.period_range('2015-01-01', freq='D', periods=3)
if has_tz:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this could also leverage hasattr instead but I felt the explicitness of the parameter is more useful

tm.assert_index_equal(result, expected)

@pytest.mark.parametrize("klass", [pd.Series, pd.DataFrame])
def test_constructor_from_series_freq(self, klass):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could arguably be split into two separate tests given the size of the conditional. Open to suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes woudl do that

@@ -237,61 +223,63 @@ def test_constructor_int_dtype_float(self, dtype):
result = Index([0., 1., 2., 3.], dtype=dtype)
tm.assert_index_equal(result, expected)

def test_constructor_int_dtype_nan(self):
@pytest.mark.parametrize("dtype,klass_or_raises", [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how we feel about mixing types in klass_or_raises - could also be split into a boolean flag for raises and the klass, though the test would typically just use one or the other

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha just made this comment


@pytest.mark.parametrize("swap_objs", [True, False])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a strange parameter but just emulating the existing test - one assertion is done with the datetime at the initial position and another assertion is done with the timedelta at the initial position

# below should coerce
[1., 2., 3.], np.array([1., 2., 3.], dtype=float)
])
def test_constructor_dtypes_to_int64(self, vals):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to a lot of the tests below and could arguably be built with parameters of "vals,dtype,klass" but I felt it was cleaner and less repetition to just break the tests by dtype

for idx in [Index(np.array([np.timedelta64(1, 'D'), np.timedelta64(
1, 'D')])), Index([timedelta(1), timedelta(1)])]:
assert isinstance(idx, TimedeltaIndex)
@pytest.mark.parametrize("cast_idx", [True, False])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this in a few tests and am not a huge fan of it, but I wasn't sure of a better way to emulate the existing behavior where dtype=object is in the Index constructor in half of the tests


def test_constructor_empty(self):
def test_constructor_empty_gen(self):
skip_index_keys = ["repeats", "periodIndex", "rangeIndex",
"tuples"]
for key, idx in self.generate_index_types(skip_index_keys):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general some of these magic class methods could probably be replaced with pytest functionality. I haven't looked in too much detail just yet but was planning on reviewing after giving the module a first pass at parametrization without changing class behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes see my comments below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note there is already a fixture of indices which works, so have 3 methods of specifying things ATM:

  • indices fixture
  • generate_index_types(...)
  • self.indices

needs to clean this and just make fixtures that we can use generally (in conftest) with docs

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good.

df['date'] = dts
result = DatetimeIndex(df['date'], freq='MS')
assert df['date'].dtype == object
expected.name = 'date'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put some blank lines in between text like this (so its easily readable)

expected = pd.Index(array)
result = pd.Index(ArrayLike(array))
tm.assert_index_equal(result, expected)
expected = pd.Index(array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you audit for whether we use pd.Index (and similar) or Index (we are not generally consistent), like to be consistent within in a module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep " Index(" -r pandas/tests --include="*.py" | wc -l
     891
grep " pd.Index(" -r pandas/tests --include="*.py" | wc -l
     264

May be a more robust way of doing it but assuming directionally accurate first is more widely used. Particular to this module I see numbers of 175 and 48, respectively.

I personally prefer pd.Index for explicitness but am fine to change to Index for module-consistency. lmk

@@ -237,61 +223,63 @@ def test_constructor_int_dtype_float(self, dtype):
result = Index([0., 1., 2., 3.], dtype=dtype)
tm.assert_index_equal(result, expected)

def test_constructor_int_dtype_nan(self):
@pytest.mark.parametrize("dtype,klass_or_raises", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave these as 2 separate. generally having parameterize where one case raises is not great (you can name the same however, with _errors on one of the test names)

na_list = [na_val, na_val]
exp = klass(na_list)
assert exp.dtype == dtype
tm.assert_index_equal(Index(na_list), exp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use
result =

tm.assert_index_equal(Index(np.array(na_list)), exp)

@pytest.mark.parametrize("data", [
[pd.NaT, np.nan], [np.nan, pd.NaT], [np.nan, np.datetime64('nat')],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make this into a fixture (e.g. nulls_fixture) (if you want to add to conftest). note we should change globally (but can do that in another PR)

@@ -499,25 +483,25 @@ def test_insert(self):
null_index = Index([])
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a'))

@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. you can use the nulls_fixture from above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sorry missed this in the last update. I'm assuming that the fixture should NOT include None as a null value so I can break that off into it's own test on the next push. Let me know if you do in fact want None to be included there though (will require updates to the other function using the fixture)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be including None because I think we do convert it for all but object dtypes (where it is left alone). so maybe need 2 fixtures. might be tricky to do this in a general way.

(0, Index(['b', 'c', 'd'], name='idx')),
(-1, Index(['a', 'b', 'c'], name='idx'))
])
def test_delete(self, pos, exp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that some of the tests here should really be parametreized on index types themselves, there is an open issue on that (eg. these set type ops, test_delete and such), are generally tested by the subclasses, but we need a more general cleanup on that

@jreback jreback added Testing pandas testing functions or related to the test suite Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Apr 6, 2018
@codecov
Copy link

codecov bot commented Apr 10, 2018

Codecov Report

Merging #20624 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20624      +/-   ##
==========================================
- Coverage   91.85%   91.82%   -0.03%     
==========================================
  Files         153      153              
  Lines       49310    49310              
==========================================
- Hits        45292    45280      -12     
- Misses       4018     4030      +12
Flag Coverage Δ
#multiple 90.21% <ø> (-0.03%) ⬇️
#single 41.9% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31e77b0...6e46b57. Read the comment docs.

Copy link
Member Author

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've deferred making any changes around the Index vs pd.Index conversation. I'm assuming that we can either do that as one cleanup at the end for the entire module or bundle into a separate change somewhere else but can also include incrementally in this PR if you'd prefer

exp = pd.DatetimeIndex([pd.NaT, pd.NaT])
assert exp.dtype == 'datetime64[ns]'

for data in [[pd.NaT, np.nan], [np.nan, pd.NaT],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a mistake before with pd.NaT to pair with np.nan instead of the datetime constructor. With the new parametrized test that has been adjusted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think so, this is a construction with mixed types of nulls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but judging off the original test right below it there is something slightly off. This one only uses np.datetime64('nat') in 2/4 parameters, but the below test uses np.timedelta64('nat') in 4/4 parameters.

FWIW it may not hurt to add more constructor tests here especially if we add None to the nulls_fixture. Will take a deeper look on next pass at this

@@ -499,25 +483,25 @@ def test_insert(self):
null_index = Index([])
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a'))

@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sorry missed this in the last update. I'm assuming that the fixture should NOT include None as a null value so I can break that off into it's own test on the next push. Let me know if you do in fact want None to be included there though (will require updates to the other function using the fixture)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a messy file. prob need to do multiple PR's to get this really nice.

@@ -87,3 +87,11 @@ def join_type(request):
Fixture for trying all types of join operations
"""
return request.param


@pytest.fixture(params=[numpy.nan, pandas.NaT])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need None as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use np and pd

tm.assert_index_equal(result, expected)

@pytest.mark.parametrize("klass", [pd.Series, pd.DataFrame])
def test_constructor_from_series_freq(self, klass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes woudl do that

assert result.tz == idx.tz
@pytest.mark.parametrize("cast_as_obj", [True, False])
@pytest.mark.parametrize("idx,has_tz", [
(pd.date_range('2015-01-01 10:00', freq='D', periods=3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prob drop the has_tz, then can make the idx a fixture above (in the test conditional you can directly test if isnstance DTI & tz is not None); then you can add a DTI w/o a tz here as well.


@pytest.mark.parametrize("pos", [0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests that are strictly for datetime like things can prob just move to a new file: test_datetimelike.py (IOW tests that test all of DTI,TDI,PI but all together). can be future PR (or here).

IOW test_base should be only non-datetimelike tests.

for tz in [None, 'UTC', 'US/Eastern', 'Asia/Tokyo']:
idx = pd.date_range('2011-01-01', periods=5, tz=tz)
dtype = idx.dtype
@pytest.mark.parametrize("tz", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to pandas/tests/indexes/datetimes/test_timezones.py (might be a duplicate as well)


@pytest.mark.parametrize("attr", ['values', 'asi8'])
@pytest.mark.parametrize("klass", [pd.Index, pd.TimedeltaIndex])
def test_constructor_dtypes_timedelta(self, attr, klass):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to pandas/tests/indexes/timedeltas/test_construction (maybe duplicate)


def test_constructor_empty(self):
def test_constructor_empty_gen(self):
skip_index_keys = ["repeats", "periodIndex", "rangeIndex",
"tuples"]
for key, idx in self.generate_index_types(skip_index_keys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note there is already a fixture of indices which works, so have 3 methods of specifying things ATM:

  • indices fixture
  • generate_index_types(...)
  • self.indices

needs to clean this and just make fixtures that we can use generally (in conftest) with docs

labels=[[], []])
assert isinstance(empty, MultiIndex)
@pytest.mark.parametrize("empty,klass", [
(PeriodIndex([], freq='B'), PeriodIndex),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be duplicated elsewhere. am iffy on where to put things like this. maybe worth extracting things from this file and makign a test_construction for things like this

@@ -499,25 +483,25 @@ def test_insert(self):
null_index = Index([])
tm.assert_index_equal(Index(['a']), null_index.insert(0, 'a'))

@pytest.mark.parametrize("na_val", [np.nan, pd.NaT, None])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be including None because I think we do convert it for all but object dtypes (where it is left alone). so maybe need 2 fixtures. might be tricky to do this in a general way.

@WillAyd
Copy link
Member Author

WillAyd commented Apr 16, 2018

Most but not all edits made. I avoided anything that requires moving things out of the module, if only because my suggested plan of attack would be:

  1. Parametrize the existing module where appropriate (this covers ~650 of ~2400 lines, so maybe 3 or 4 more similar PRs)
  2. Replace any class methods / attributes with fixtures where appropriate
  3. Move tests into new modules / sub-modules

Open to suggestions on approach.

@@ -89,6 +89,14 @@ def join_type(request):
return request.param


@pytest.fixture(params=[None, np.nan, pd.NaT])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Some more for the list: np.timedelta64('NaT'), np.datetime64('NaT'), float('nan'), np.float('NaN').

Note that np.float('NaN') does not return np.nan (i.e. does not behave like pd.NaT), so this would catch any occurrences in pandas of e.g if foo is np.nan

@jreback jreback added this to the 0.23.0 milestone Apr 21, 2018
@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

@WillAyd can you rebase.

can you expand the nulls this as well? to as much as you can while still having this pass?

@WillAyd
Copy link
Member Author

WillAyd commented Apr 23, 2018

Added the suggestions from @jorisvandenbossche with the exception of np.timedelta64('NaT') and np.datetime64('NaT') as they caused a failure with the below test.

https://github.com/WillAyd/pandas/blob/6e46b57bc99e70b96eb6de448ab12ff7930a2d10/pandas/tests/indexes/test_base.py#L272

The test is parametrized for a DatetimeIndex and a TimedeltaIndex and the failure occurs when you try to construct from a list that contains one instance of both (it returns an object-typed Index instead of one or the other).

I think this is a corner case for this test so we could definitely still add to the fixture and have the test be responsible for skipping where appropriate, but for now I've just kept out of the fixture

@jreback jreback merged commit 3bb58ac into pandas-dev:master Apr 24, 2018
@jreback
Copy link
Contributor

jreback commented Apr 24, 2018

thanks @WillAyd

if you can create an issue for extending the nulls fixture (and issue around that), and an issue about cleaning more testing things (e.g. comments above).

@WillAyd WillAyd deleted the idx-clnup branch April 24, 2018 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants