Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float conversion makes fillna lossy #24537

Closed
Tracked by #3
jbrockmendel opened this issue Jan 1, 2019 · 9 comments · Fixed by #50671
Closed
Tracked by #3

Float conversion makes fillna lossy #24537

jbrockmendel opened this issue Jan 1, 2019 · 9 comments · Fixed by #50671
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@jbrockmendel
Copy link
Member

For int64 data near the int64 implementation bounds, astype('float64') or ensure_float64 is lossy. The motivating case is PeriodArray.fillna

dti = pd.date_range(pd.Timestamp.max - pd.Timedelta(nanoseconds=10), periods=5, freq='ns')
pi = dti.to_period('ns')
parr = pi._data
parr[2] = pd.NaT

>>> parr.fillna(method='pad')
<PeriodArray>
['NaT', 'NaT', 'NaT', 'NaT', 'NaT']
Length: 5, dtype: period[N]
@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

In [1]: dti = pd.date_range(pd.Timestamp.max - pd.Timedelta(nanoseconds=10), periods=5, freq='ns')
   ...: pi = dti.to_period('ns')
   ...: parr = pi._data
   ...: parr[2] = pd.NaT
   ...: 

In [2]: parr
Out[2]: 
<PeriodArray>
['2262-04-11 23:47:16.854775797', '2262-04-11 23:47:16.854775798',
                           'NaT', '2262-04-11 23:47:16.854775800',
 '2262-04-11 23:47:16.854775801']
Length: 5, dtype: period[N]

In [4]: Series(parr).fillna(parr[0])
Out[4]: 
0    2262-04-11 23:47:16.854775797
1    2262-04-11 23:47:16.854775798
2    2262-04-11 23:47:16.854775797
3    2262-04-11 23:47:16.854775800
4    2262-04-11 23:47:16.854775801
dtype: period[N]

Series goes thru the correct path, wonder what its dispatching too

actually no, you are right this is lossy.

@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jan 1, 2019
@jbrockmendel
Copy link
Member Author

(Internet is down, typing with thumbs for a while)

I’ve poked at this a bit for interpolate; if the int64 vals fall inside int32 bounds then we are OK. Otherwise need to cast to float128 to be assured lossless.

@mroeschke
Copy link
Member

This looks okay on master now. Could use a test

In [20]: parr.fillna(method='pad')
Out[20]:
<PeriodArray>
['2262-04-11 23:47:16.854775797', '2262-04-11 23:47:16.854775798',
 '2262-04-11 23:47:16.854775798', '2262-04-11 23:47:16.854775800',
 '2262-04-11 23:47:16.854775801']
Length: 5, dtype: period[N]

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jun 25, 2021
@KianYang-Lee
Copy link

Hi @mroeschke, I would like to contribute. Can you provide a little guideline on what to test on? Thanks

@mroeschke
Copy link
Member

@KianYang-Lee would need a test to validate that the code snippet in the original post returns the result in my previous comment.

@KianYang-Lee
Copy link

OK taking this. will come up with the test and result soon

@jackgoldsmith4
Copy link
Contributor

@KianYang-Lee are you still working on this?

@HoWeiChin
Copy link

@jackgoldsmith4
may I take it instead?

@KianYang-Lee
Copy link

No, I'm not. Got caught up with work. Sorry and Please proceed

@KianYang-Lee are you still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants