Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate Series/Dataframe.to_dense/to_sparse #26684

Merged
merged 37 commits into from
Jun 19, 2019
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
a71737c
Deprecate Series/Dataframe.to_dense/to_sparse()
VikramjeetD Jun 5, 2019
fc08e93
Update series.py
VikramjeetD Jun 5, 2019
82c713d
Beautify
VikramjeetD Jun 5, 2019
39230d4
Beautify
VikramjeetD Jun 5, 2019
1e7c0e8
Beautify
VikramjeetD Jun 7, 2019
e68826c
Update Deprecated SparseDF/Series tests
VikramjeetD Jun 7, 2019
20b8962
Beautify
VikramjeetD Jun 8, 2019
50d0534
Deprecate NDFrame.to_dense
VikramjeetD Jun 8, 2019
933162d
Beautify
VikramjeetD Jun 8, 2019
dd1e6c2
Silence test time deprecation warnings
VikramjeetD Jun 8, 2019
c7f27fd
Beautify
VikramjeetD Jun 8, 2019
be14520
Propose changes to certain tests
VikramjeetD Jun 8, 2019
b12e447
Propose changes to tests. IGNORE PREV COMMIT.
VikramjeetD Jun 8, 2019
2d4de51
Silence test time deprecation warnings
VikramjeetD Jun 9, 2019
eede9b8
Add tests for Series/DataFrame.to_sparse
VikramjeetD Jun 9, 2019
0b08795
Beautify
VikramjeetD Jun 9, 2019
5182a1f
Modify test time warning silence
VikramjeetD Jun 12, 2019
15909c5
Modify groupby ops to remove NDFrame test warnings and add wcatch for…
VikramjeetD Jun 17, 2019
104c12a
Remove filterwarning from test_hist_method.py
VikramjeetD Jun 17, 2019
e713fb0
Update sparsearray test_arithmetics warning
VikramjeetD Jun 17, 2019
587b14f
Remove filterwarning from test_decimal.py
VikramjeetD Jun 17, 2019
1318676
Beautify
VikramjeetD Jun 17, 2019
9043e03
Merge branch 'master' of https://github.com/IntEll1gent/pandas
VikramjeetD Jun 17, 2019
58c678a
Beautify
VikramjeetD Jun 17, 2019
ca14ac1
Merge branch 'master' of https://github.com/IntEll1gent/pandas
VikramjeetD Jun 17, 2019
0c8f287
Update test warnings
VikramjeetD Jun 17, 2019
871ccff
Merge remote-tracking branch 'upstream/master'
VikramjeetD Jun 17, 2019
a8f6c56
Update pandas/core/generic.py
VikramjeetD Jun 17, 2019
72aaca5
Merge branch 'master' of https://github.com/IntEll1gent/pandas
VikramjeetD Jun 17, 2019
6a6e333
Update test time warnings and rectify df/series.to_sparse double warn…
VikramjeetD Jun 17, 2019
4e67856
Update more test time warnings
VikramjeetD Jun 17, 2019
4a3181b
Update test warnings
VikramjeetD Jun 17, 2019
a546a89
Update test warnings
VikramjeetD Jun 17, 2019
5fdb2f8
Revert minor changes
VikramjeetD Jun 17, 2019
a627828
Change location of Series/df.test_deprecated_to_sparse
VikramjeetD Jun 18, 2019
3d36430
Remove SDF/SS.to_dense depr:class already deprecated
VikramjeetD Jun 18, 2019
9f888c5
Add whatsnew entry
VikramjeetD Jun 18, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1889,6 +1889,8 @@ def to_sparse(self, fill_value=None, kind='block'):
"""
Convert to SparseDataFrame.

.. deprecated:: 0.25.0

Implement the sparse version of the DataFrame meaning that any data
matching a specific value it's omitted in the representation.
The sparse DataFrame allows for a more efficient storage.
Expand Down Expand Up @@ -1939,6 +1941,9 @@ def to_sparse(self, fill_value=None, kind='block'):
>>> type(sdf) # doctest: +SKIP
<class 'pandas.core.sparse.frame.SparseDataFrame'>
"""
warnings.warn("DataFrame.to_sparse is deprecated and will be removed "
"in a future version", FutureWarning, stacklevel=2)

from pandas.core.sparse.api import SparseDataFrame
return SparseDataFrame(self._series, index=self.index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wont' this also trigger the SDF warnings? (should this just be changed to create a DF here?

Copy link
Contributor Author

@VikramjeetD VikramjeetD Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the SDF deprecation warning? If thats the case, yes, but this was requested in the issue that this PR will address ( #26557 ).
Also, this might be helpful if someone has already hit the SDF warning through a different route, as a gentle reminder. Either way we could decide whats best and update accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it already triggers the SparseDataFrame warning. But, it is still good to explicitly deprecate this method as well (in the docs, and by giving a clearer warning message, although the other one will also still be present).

I don't think we should change the behaviour, since we are deprecating it (it would also be a backwards incompatible change, as DataFrame with sparse and SparseDataFrame are not fully interchangeable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can agree to keep both the warnings and resolve this conversation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i would suppress the SDF
warning here as the user facing to_sparse is already good enough

columns=self.columns, default_kind=kind,
Expand Down Expand Up @@ -2282,7 +2287,7 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
text_col 5 non-null object
float_col 5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
memory usage: 248.0+ bytes
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved

Prints a summary of columns count and its dtypes but not per column
information:
Expand All @@ -2292,7 +2297,7 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
memory usage: 248.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:
Expand Down Expand Up @@ -2494,7 +2499,7 @@ def memory_usage(self, index=True, deep=False):
4 1 1.0 1.0+0.0j 1 True

>>> df.memory_usage()
Index 80
Index 128
int64 40000
float64 40000
complex128 80000
Expand All @@ -2513,7 +2518,7 @@ def memory_usage(self, index=True, deep=False):
The memory footprint of `object` dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index 80
Index 128
int64 40000
float64 40000
complex128 80000
Expand All @@ -2525,7 +2530,7 @@ def memory_usage(self, index=True, deep=False):
many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5168
5216
"""
result = Series([c.memory_usage(index=False, deep=deep)
for col, c in self.iteritems()], index=self.columns)
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1964,11 +1964,16 @@ def to_dense(self):
"""
Return dense representation of NDFrame (as opposed to sparse).

.. deprecated:: 0.25.0

Returns
-------
%(klass)s
Dense %(klass)s.
"""
warnings.warn("NDFrame.to_dense is deprecated "
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
"and will be removed in a future version",
FutureWarning, stacklevel=2)
# compat
return self

Expand Down
6 changes: 3 additions & 3 deletions pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -630,9 +630,9 @@ def _aggregate_series_fast(self, obj, func):
group_index, _, ngroups = self.group_info

# avoids object / Series creation overhead
dummy = obj._get_values(slice(None, 0)).to_dense()
dummy = obj._get_values(slice(None, 0))
indexer = get_group_index_sorter(group_index, ngroups)
obj = obj._take(indexer).to_dense()
obj = obj._take(indexer)
group_index = algorithms.take_nd(
group_index, indexer, allow_fill=False)
grouper = reduction.SeriesGrouper(obj, func, group_index, ngroups,
Expand Down Expand Up @@ -879,7 +879,7 @@ def apply(self, f):
class SeriesSplitter(DataSplitter):

def _chop(self, sdata, slice_obj):
return sdata._get_values(slice_obj).to_dense()
return sdata._get_values(slice_obj)


class FrameSplitter(DataSplitter):
Expand Down
11 changes: 8 additions & 3 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1575,6 +1575,8 @@ def to_sparse(self, kind='block', fill_value=None):
"""
Convert Series to SparseSeries.

.. deprecated:: 0.25.0

Parameters
----------
kind : {'block', 'integer'}, default 'block'
Expand All @@ -1586,6 +1588,9 @@ def to_sparse(self, kind='block', fill_value=None):
SparseSeries
Sparse representation of the Series.
"""

warnings.warn("Series.to_sparse is deprecated and will be removed "
"in a future version", FutureWarning, stacklevel=2)
from pandas.core.sparse.series import SparseSeries

values = SparseArray(self, kind=kind, fill_value=fill_value)
Expand Down Expand Up @@ -4010,7 +4015,7 @@ def memory_usage(self, index=True, deep=False):
--------
>>> s = pd.Series(range(3))
>>> s.memory_usage()
104
152

Not including the index gives the size of the rest of the data, which
is necessarily smaller:
Expand All @@ -4024,9 +4029,9 @@ def memory_usage(self, index=True, deep=False):
>>> s.values
array(['a', 'b'], dtype=object)
>>> s.memory_usage()
96
144
>>> s.memory_usage(deep=True)
212
260
"""
v = super().memory_usage(deep=deep)
if index:
Expand Down
20 changes: 20 additions & 0 deletions pandas/core/sparse/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,11 @@ def _init_spmatrix(self, data, index, columns, dtype=None,
def to_coo(self):
return SparseFrameAccessor(self).to_coo()

def __repr__(self):
with warnings.catch_warnings():
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
warnings.filterwarnings("ignore", "Sparse")
return super().__repr__()

def __getstate__(self):
# pickling
return dict(_typ=self._typ, _subtyp=self._subtyp, _data=self._data,
Expand Down Expand Up @@ -277,6 +282,21 @@ def _unpickle_sparse_frame_compat(self, state):

@Appender(SparseFrameAccessor.to_dense.__doc__)
def to_dense(self):
"""
.. deprecated:: 0.25.0
Use Dataframe.sparse.to_dense() instead
"""

warning_message = """\
SparseDataFrame.to_dense is deprecated and will be removed in a future version

Use Dataframe.sparse.to_dense() instead

>>> df = pd.DataFrame({"A": pd.SparseArray([0, 1, 0])})
>>> df.sparse.to_dense()
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
"""
warnings.warn(warning_message, FutureWarning, stacklevel=2)

return SparseFrameAccessor(self).to_dense()

def _apply_columns(self, func):
Expand Down
16 changes: 11 additions & 5 deletions pandas/core/sparse/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,10 +214,12 @@ def as_sparse_array(self, kind=None, fill_value=None, copy=False):
fill_value=fill_value, kind=kind, copy=copy)

def __repr__(self):
series_rep = Series.__repr__(self)
rep = '{series}\n{index!r}'.format(series=series_rep,
index=self.sp_index)
return rep
with warnings.catch_warnings():
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
warnings.filterwarnings("ignore", "Sparse")
series_rep = Series.__repr__(self)
rep = '{series}\n{index!r}'.format(series=series_rep,
index=self.sp_index)
return rep

def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
filter_type=None, **kwds):
Expand Down Expand Up @@ -429,11 +431,15 @@ def _set_values(self, key, value):
def to_dense(self):
"""
Convert SparseSeries to a Series.

.. deprecated:: 0.25.0
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
Returns
-------
s : Series
"""
warnings.warn("SparseSeries.to_dense is deprecated "
"and will be removed in a future version",
FutureWarning, stacklevel=2)

return Series(self.values.to_dense(), index=self.index,
name=self.name)

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/arrays/sparse/test_arithmetics.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@


@pytest.mark.filterwarnings("ignore:Sparse:FutureWarning")
@pytest.mark.filterwarnings("ignore:Series.to_sparse:FutureWarning")
jreback marked this conversation as resolved.
Show resolved Hide resolved
class TestSparseArrayArithmetics:

_base = np.array
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/frame/test_deprecations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import numpy as np
import pytest

VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
import pandas as pd
from pandas.util import testing as tm


@pytest.mark.filterwarnings("ignore:Sparse:FutureWarning")
def test_deprecated_to_sparse():
df = pd.DataFrame({"A": [1, np.nan, 3]})
sparse_df = pd.SparseDataFrame({"A": [1, np.nan, 3]})

# Deprecated 0.25.0
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
result = df.to_sparse()
tm.assert_frame_equal(result, sparse_df)
13 changes: 13 additions & 0 deletions pandas/tests/generic/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -918,3 +918,16 @@ def test_axis_classmethods(self, box):
assert obj._get_axis_name(v) == box._get_axis_name(v)
assert obj._get_block_manager_axis(v) == \
box._get_block_manager_axis(v)

def test_deprecated_to_dense(self):
# Deprecated 0.25.0
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved

df = pd.DataFrame({"A": [1, 2, 3]})
with tm.assert_produces_warning(FutureWarning):
result = df.to_dense()
tm.assert_frame_equal(result, df)

ser = pd.Series([1, 2, 3])
with tm.assert_produces_warning(FutureWarning):
result = ser.to_dense()
tm.assert_series_equal(result, ser)
10 changes: 8 additions & 2 deletions pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -1018,13 +1018,19 @@ def test_sparse(self):
df = pd.DataFrame(np.random.randn(10, 4))
df.loc[:8] = np.nan

sdf = df.to_sparse()
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here, you can also a filterwarning instead (since the test already has a filterwarning for the main Sparse deprecation as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check on that, but we still have to come to an unanimous decision whether to keep the double deprecation warning for Series/DataFrame.to_sparse. None of this will be needed if we deicide to stick with only 1 warning for the main deprecation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check on that, but we still have to come to an unanimous decision whether to keep the double deprecation warning for Series/DataFrame.to_sparse. None of this will be needed if we deicide to stick with only 1 warning for the main deprecation.

We should not keep the double warning at all.

check_stacklevel=False):
sdf = df.to_sparse()
expected = df.to_json()
assert expected == sdf.to_json()

s = pd.Series(np.random.randn(10))
s.loc[:8] = np.nan
ss = s.to_sparse()
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss = s.to_sparse()

expected = s.to_json()
assert expected == ss.to_json()
Expand Down
30 changes: 24 additions & 6 deletions pandas/tests/io/test_packers.py
Original file line number Diff line number Diff line change
Expand Up @@ -566,15 +566,24 @@ def test_sparse_series(self):

s = tm.makeStringSeries()
s[3:5] = np.nan
ss = s.to_sparse()
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
VikramjeetD marked this conversation as resolved.
Show resolved Hide resolved
check_stacklevel=False):
ss = s.to_sparse()
self._check_roundtrip(ss, tm.assert_series_equal,
check_series_type=True)

ss2 = s.to_sparse(kind='integer')
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss2 = s.to_sparse(kind='integer')
self._check_roundtrip(ss2, tm.assert_series_equal,
check_series_type=True)

ss3 = s.to_sparse(fill_value=0)
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss3 = s.to_sparse(fill_value=0)
self._check_roundtrip(ss3, tm.assert_series_equal,
check_series_type=True)

Expand All @@ -583,16 +592,25 @@ def test_sparse_frame(self):
s = tm.makeDataFrame()
s.loc[3:5, 1:3] = np.nan
s.loc[8:10, -2] = np.nan
ss = s.to_sparse()
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss = s.to_sparse()

self._check_roundtrip(ss, tm.assert_frame_equal,
check_frame_type=True)

ss2 = s.to_sparse(kind='integer')
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss2 = s.to_sparse(kind='integer')
self._check_roundtrip(ss2, tm.assert_frame_equal,
check_frame_type=True)

ss3 = s.to_sparse(fill_value=0)
# GH 26557: DEPR
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
ss3 = s.to_sparse(fill_value=0)
self._check_roundtrip(ss3, tm.assert_frame_equal,
check_frame_type=True)

Expand Down
Loading