Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SparseSeries init from dict fixes #16906

Closed
wants to merge 8 commits into from

Conversation

kernc
Copy link
Contributor

@kernc kernc commented Jul 13, 2017

  • closes BUG: SparseSeries from dict inconsistency #16905
  • tests added / passed
  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff (On Windows, git diff upstream/master -u -- "*.py" | flake8 --diff might work as an alternative.)
  • whatsnew entry

@kernc
Copy link
Contributor Author

kernc commented Jul 13, 2017

Tests copied/adapted from tests.series.test_constructors.

@codecov
Copy link

codecov bot commented Jul 13, 2017

Codecov Report

Merging #16906 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16906      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49293    49292       -1     
==========================================
- Hits        44854    44844      -10     
- Misses       4439     4448       +9
Flag Coverage Δ
#multiple 88.74% <100%> (-0.01%) ⬇️
#single 40.19% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.06% <100%> (-0.02%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.76% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 692b5ee...659559c. Read the comment docs.

@jreback jreback added Bug Sparse Sparse Data Type labels Jul 13, 2017

data = A(('col%s' % i, np.random.random()) for i in range(12))
s = SparseSeries(data)
tm.assert_numpy_array_equal(s.values.values, np.array(list(data.values())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use assert_sp_series_equal (you can pass check_list=False) and then add an assert about the column ordering

return dict(zip((constructor(x) for x in dates_as_str), values))

data_datetime64 = create_data(np.datetime64)
data_datetime = create_data(lambda x: datetime.strptime(x, '%Y-%m-%d'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this test

expected = SparseSeries([x[1] for x in _d],
index=pd.Index([x[0] for x in _d],
tupleize_cols=False))
ser = ser.reindex(index=expected.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=

@@ -179,6 +179,7 @@ Sparse
^^^^^^


- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index`` (:issue:`16905`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index= kwarg

@jreback jreback added this to the 0.21.0 milestone Jul 13, 2017


def test_constructor_dict():
d = {'a': 0., 'b': 1., 'c': 2.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be able to move some of these into from pandas.tests.series.test_api import SharedWithSparse whech we already import (rather than directly copying them).

@@ -179,6 +179,7 @@ Sparse
^^^^^^


- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index=`` kwarg (:issue:`16905`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't matter whether index is passed in, why mention it in the description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a reference to the two issues fixed. With, the result was invalid; without, it crashed.

Copy link
Member

@gfyoung gfyoung Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, though ultimately instantiating from dict just didn't work at all though, which could be considered a single bug (also you're only referencing one issue here). Note that without further context, people won't be aware of that difference (whether it was incorrect or crashed), so it is preferable to be concise.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2017

@kernc lmk if you are able to share some test code with Series, can always do a followup.

@pep8speaks
Copy link

pep8speaks commented Jul 16, 2017

Hello @kernc! Thanks for updating the PR.

Line 128:80: E501 line too long (83 > 79 characters)
Line 166:32: E128 continuation line under-indented for visual indent

Comment last updated on July 17, 2017 at 21:25 Hours UTC

@kernc kernc force-pushed the sparse-series-fromdict branch from 5dcec57 to 991b99a Compare July 16, 2017 16:54
@kernc
Copy link
Contributor Author

kernc commented Jul 16, 2017

Was able to share some test code with Series with as little as possible effort. Not sure if OK, though.


result = self.Series(d, index=['b', 'c', 'd', 'a'])
expected = self.Series([1, 2, np.nan, 0], index=['b', 'c', 'd', 'a'])
tm.assert_series_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check sparseness. I might modify assert_series_equal to dispatch to assert_sp_series_equal if both are SparseSeries.

Copy link
Member

@gfyoung gfyoung Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of my comment below, I'd rather that we keep sparse equality and Series equality checks separate. Perhaps if we could write a function like:

def _check_series_equal(self, left, right):
     ...

that dispatches to tm.assert_series_equal OR tm.assert_sp_series_equal depending on test class. It would seem a little clearer implementation-wise.

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

Was able to share some test code with Series with as little as possible effort. Not sure if OK, though.

I'm a little hesitant about this code-sharing because the readability decreased IMO. self.Series for me a just a little harder to understand.

@kernc
Copy link
Contributor Author

kernc commented Jul 17, 2017

How else would you share code without self.Series? And if you'd rather not share code, how else would you ensure the two is-a-Series types get roughly the same amount of use case and API coverage?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2017

@kernc this looks fine. ping on green.

that was failing due to introduced dispatch to assert_sp_series_equal
being too strict.
@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2017

How else would you share code without self.Series?

I personally find the name a little confusing because I only think the Series class and not SparseSeries, even though the latter is a subclass of the former. 😄

A name like self.series_klass might have been easier because it doesn't have that confusion of naming.

d = {'a': 0., 'b': 1., 'c': 2.}
result = self.series_klass(d)
expected = self.series_klass(d, index=sorted(d.keys()))
tm.assert_series_equal(result, expected)
Copy link
Member

@gfyoung gfyoung Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think GitHub hid away my comment about this, but I think we should not interlace assert_series_equal and assert_sp_series_equal. It reduces modularity, and "assert_series_equal" is confusing for SparseSeries I find.

I propose that we do the following and define this method:

def assert_series_klass_equal(result, expected):
    klass_name = self.series_klass.__name__

    if klass_name == "Series":
        tm.assert_series_equal(result, expected)
    elif klass_name == "SparseSeries":
        tm.assert_sp_series_equal(result, expected)
    else:
        raise ValueError("Invalid 'series_klass' : {name}".format(name=klass_name))

That way you also don't need to modify assert_series_equal. You can then call this method without worrying what type of Series you are comparing.

@jreback
Copy link
Contributor

jreback commented Jul 18, 2017

@kernc would you be ok with merging #16960, close this, then you can refactor tests in a new PR?

@kernc
Copy link
Contributor Author

kernc commented Jul 19, 2017

Of course.

@kernc kernc closed this Jul 19, 2017
@gfyoung gfyoung modified the milestones: No action, 0.21.0 Jul 19, 2017
@gfyoung gfyoung added the Duplicate Report Duplicate issue or pull request label Jul 19, 2017
@kernc
Copy link
Contributor Author

kernc commented Jul 21, 2017

Continued in #17050.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: SparseSeries from dict inconsistency
4 participants