Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Downcasting neither warned against nor discussed #50217

Closed
1 task done
mcp292 opened this issue Dec 12, 2022 · 4 comments
Closed
1 task done

DOC: Downcasting neither warned against nor discussed #50217

mcp292 opened this issue Dec 12, 2022 · 4 comments
Labels
Docs Needs Info Clarification about behavior needed to assess issue

Comments

@mcp292
Copy link

mcp292 commented Dec 12, 2022

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/user_guide/basics.html#dtypes

Documentation problem

Upcasting is discussed, but not downcasting.

Suggested fix for documentation

It is fine not to implement a warning when downcasting will cause data loss, but the fact that this is not implemented means the user should be advised that they assume the risk.

In short, the behavior when downcasting a column should be discussed antonymously to the upcasting section in the documentation.


Resulting from #50214.

@mcp292 mcp292 added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 12, 2022
@MarcoGorelli
Copy link
Member

hi - what do you mean by "downcasting" a column?

could you please provide a simple example, like:

  • here's what I ran
  • here's what happened
  • here's what I expected

@MarcoGorelli MarcoGorelli added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 21, 2022
@mcp292
Copy link
Author

mcp292 commented Dec 21, 2022

Please see the issue linked in the original post and let me know if that's what you're looking for.

@MarcoGorelli
Copy link
Member

Which downcasting are you referring to?

From #50125 (comment) , as far as I can tell, it's just what numpy does

In [28]: arr[2]
Out[28]: array([11.11], dtype=float16)

In [29]: arr[2].item()
Out[29]: 11.109375

And regarding the inequality, this reproduces in pure numpy, so isn't an issue in pandas

In [30]: arr = np.array(data, dtype=np.float16)

In [31]: arr[arr <= 99.99]
Out[31]:
array([  0.   ,  10.25 ,  11.11 ,   5.555,  50.56 ,  70.75 ,  99.   ,
       100.   , 100.   , 100.   ], dtype=float16)

In [32]: arr
Out[32]:
array([  0.   ,  10.25 ,  11.11 ,   5.555,  50.56 ,  70.75 ,  99.   ,
       100.   , 100.   , 100.   ], dtype=float16)

@MarcoGorelli
Copy link
Member

And regarding the inequality, this reproduces in pure numpy, so isn't an issue in pandas

As per the above, pandas did not do any downcasting here

Closing then, but thanks for the report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

2 participants