-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: use f-strings where possible #49229
Conversation
@@ -345,14 +345,14 @@ def _str_match( | |||
self, pat: str, case: bool = True, flags: int = 0, na: Scalar | None = None | |||
): | |||
if not pat.startswith("^"): | |||
pat = "^" + pat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think these are an improvement. stick to places that currently use .format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this apply to all string concatenation? I think e.g. the changes in sql.py
(https://github.com/pandas-dev/pandas/pull/49229/files#diff-9268174bfb15f08ef2267375665a85fecf201999902542f6fc9c0d3fadfb4553 if GitHub feels like linking correctly) read much better as an f-string, for one?
EDIT: I would also like to point out that f-strings can be quite a lot faster than string concatenation, and it could easily compound in a library like Pandas:
Benchmark 1: python3 -S ex1.py
Time (mean ± σ): 2.516 s ± 0.024 s [User: 2.433 s, System: 0.011 s]
Range (min … max): 2.484 s … 2.573 s 10 runs
Benchmark 2: python3 -S ex2.py
Time (mean ± σ): 2.050 s ± 0.064 s [User: 1.970 s, System: 0.012 s]
Range (min … max): 1.967 s … 2.194 s 10 runs
Summary
'python3 -S ex2.py' ran
1.23 ± 0.04 times faster than 'python3 -S ex1.py'
where ex1
is timing lambda: "^" + pat + "$"
and ex2
is timing lambda: f"^{pat}$"
.
4884ca1
to
b1bb71a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR
There's some failing tests, if it's not an automated check that gets it right the first time (rather than one that requires manual edits) then I think we should pass on this one, 120 files are too many to review manually
I remember reading that flynt's rewrites were sometimes unsafe, let's not risk it
811033a
to
944d303
Compare
Hi @MarcoGorelli, thanks for the review. I remade this PR to limit the scope to non-test files – the test failure was caused by In addition, I took extra care to only mechanically run safe transforms, and then applied other select fixes by hand in the subsequent commits. The PR now touches 47 files instead of 120, and the changes are all practically 1 line here and there. |
afc1dea
to
be7cf22
Compare
This was done by hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for automation, but if a tool is too aggressive and requires manual fixups, that's a bit of a red flag and I don't think we should be using it, sorry. This is still failing various tests
I'd suggest looking at #48855 if you're interested in linting issues
Also, we already have pyupgrade for rewriting f-strings when it's safe to do so Closing for now then, but thanks for your PR |
@MarcoGorelli I would ask you to reconsider. This PR isn't really done by an automated tool anymore. I used |
The issue with adding mostly cosmetic changes without an automated tool is that there's no way to ensure contributors won't new code that goes against these changes, and then we'll be here again in a month's time making the same changes manually But if there's a case where this makes a performance difference at the macro level, then happy to take that- e.g. the change you mentioned here #49229 (comment) |
These aren't really only cosmetic changes.
As far as I understand, all PRs to Pandas are reviewed by humans too. Again, these changes have been made by a human (yours truly), not an automated tool. Would the context of the review be different if I had not originally mentioned a tool? |
These kinds of changes should be made by automated tools without requiring manual fixups, so that
If it's done manually, then we'll be doing it manually again in a month's time, and then again, and so on |
I'm sorry, but can you explain what "these kinds of changes" are? These changes hadn't, and couldn't have, been made automatically by a tool in CI because no tool currently exists that could do the semantic/dataflow analysis to see whether a concatenation can be safely replaced with an f-string. Fortunately, humans can do that, and as said in my comment after the first test failures, I manually, by hand, vetted each change recommended by |
Ah, darnit, apparently trying to rebase/force-push a closed PR destroys things and this can't be reopened anymore. Considering the discussion above, would you be open to re-reviewing a PR if I open a new one, @MarcoGorelli? I honestly believe these changes are objectively good and could have been done earlier by humans, but it just hadn't happened. |
I just meant mostly cosmetic / stylistic changes - if any of these delivers a measurable performance improvement, then sure, feel free to submit a PR If there's no performance improvement and it can't be rewritten/checked automatically, then I'm not sure there's much reason to make the change |
@MarcoGorelli Would https://twitter.com/raymondh/status/1205969258800275456 (found via the https://pylint.pycqa.org/en/latest/user_guide/messages/convention/consider-using-f-string.html doc page) be enough proof that f-strings will deliver performance improvements even in the simple cases? |
that tweet shows savings of a few nanoseconds -if there's some pandas operation where concatenating strings compounds and there's a noticeable performance improvement overall, then that's great - your sql.py example might be a good example of this. if it's just an error message that'd only be printed once, then I'd say this isn't worth the churn |
Well, to put the tweet in other terms, it shows improvements of 32% to 300% for a simple operation. What's the downside of "churn" for an improvement and how do Pandas developers quantify when it's worth it or not? Also, I'd like to point out that this isn't the first PR to just change things to f-strings because it's possible; has something changed since #29547? |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR replaces a bunch of string concatenations and other non-modern formatting operations with f-strings.
The first commit was done mechanically with
flynt
, followed up by further manual touch-ups thatflynt
wasn't smart enough to touch.This does not touch a bunch of concatenations in
pandas/io/stata.py
, since those are covered by cln commits in #49228.