-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows #23688
Conversation
Hello @jschendel! Thanks for submitting the PR.
|
FWIW good to have this typing consistent, though I'm surprised it's required given the Cython docs say that a float gets mapped to a double: http://docs.cython.org/en/latest/src/tutorial/caveats.html @jbrockmendel any insights? Does that seem like a bug in Cython? |
Codecov Report
@@ Coverage Diff @@
## master #23688 +/- ##
=======================================
Coverage 92.24% 92.24%
=======================================
Files 161 161
Lines 51318 51318
=======================================
Hits 47339 47339
Misses 3979 3979
Continue to review full report at Codecov.
|
thanks! |
@WillAyd no idea; we recently changed all usages of "double" and "double_t" to "float64_t" largely so I didn't have to keep double-checking that they mean the same thing. Maybe @scoder can offer some insight? |
* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...
…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
Note how that page says "Python's float type", not "C's float type". Whenever you say "cdef" in Cython, what follows is a C declaration. We specifically document that Python types like "int" and "float" are not (directly) usable in type declarations because they are shadowed by the more useful C types with the same name. There is no advantage at all in declaring a variable with those Python types, but there is good value in doing that with the C types. In short, if you want a specific C type, name it. |
In theory, they do not. The C standard does not guarantee specific behaviour for the double type, only that the precision is at least twice as high as for the single precision "float" type, i.e. it gives you minimum precision guarantees. However, as long as your code is compiled in an IEEE-754 floating point environment, which applies to pretty much all relevant system types these days, the behaviour will be exactly that of a 64-bit IEEE-754 double precision binary floating point number. Now, in practice the Anyway, explicit is better than implicit. If you want exact 64-bit float semantics, saying so in your code is probably a good idea. If you can live with C double semantics, and that is what CPython does internally, for example, saying so is probably also a good idea. |
@scoder thanks for clarifying. Have I mentioned recently how nice it is to not have to worry about these things in python and 90+% of the time in cython? @chris-b1 @jreback this is above my pay grade. Is there any chance that we need to revert parts of #23583 to use |
I might not have enough insight into the details here, but I don't think reverting is necessary. It's a bit more verbose that way, but it probably also reflects what your code does. Specifically, if you are programming against NumPy, then NumPy's data type API is more relevant than CPython's internals or C's double type here, so matching |
git diff upstream/master -u -- "*.py" | flake8 --diff