Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: use f-strings where possible #49229

Closed
wants to merge 5 commits into from
Closed

CLN: use f-strings where possible #49229

wants to merge 5 commits into from

Conversation

akx
Copy link
Contributor

@akx akx commented Oct 21, 2022

  • Tests added and passed if fixing a bug or adding a new feature
    • No new tests, behavior hasn't changed (except possibly become a bit faster)
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
    • Nothing new.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
    • No bug or feature, just cleaning.

This PR replaces a bunch of string concatenations and other non-modern formatting operations with f-strings.

The first commit was done mechanically with flynt, followed up by further manual touch-ups that flynt wasn't smart enough to touch.

This does not touch a bunch of concatenations in pandas/io/stata.py, since those are covered by cln commits in #49228.

pandas/_version.py Outdated Show resolved Hide resolved
@@ -345,14 +345,14 @@ def _str_match(
self, pat: str, case: bool = True, flags: int = 0, na: Scalar | None = None
):
if not pat.startswith("^"):
pat = "^" + pat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think these are an improvement. stick to places that currently use .format

Copy link
Contributor Author

@akx akx Oct 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this apply to all string concatenation? I think e.g. the changes in sql.py (https://github.com/pandas-dev/pandas/pull/49229/files#diff-9268174bfb15f08ef2267375665a85fecf201999902542f6fc9c0d3fadfb4553 if GitHub feels like linking correctly) read much better as an f-string, for one?

EDIT: I would also like to point out that f-strings can be quite a lot faster than string concatenation, and it could easily compound in a library like Pandas:

Benchmark 1: python3 -S ex1.py
  Time (mean ± σ):      2.516 s ±  0.024 s    [User: 2.433 s, System: 0.011 s]
  Range (min … max):    2.484 s …  2.573 s    10 runs

Benchmark 2: python3 -S ex2.py
  Time (mean ± σ):      2.050 s ±  0.064 s    [User: 1.970 s, System: 0.012 s]
  Range (min … max):    1.967 s …  2.194 s    10 runs

Summary
  'python3 -S ex2.py' ran
    1.23 ± 0.04 times faster than 'python3 -S ex1.py'

where ex1 is timing lambda: "^" + pat + "$" and ex2 is timing lambda: f"^{pat}$".

@akx akx force-pushed the f-strings branch 2 times, most recently from 4884ca1 to b1bb71a Compare October 22, 2022 13:55
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR

There's some failing tests, if it's not an automated check that gets it right the first time (rather than one that requires manual edits) then I think we should pass on this one, 120 files are too many to review manually

I remember reading that flynt's rewrites were sometimes unsafe, let's not risk it

@akx akx force-pushed the f-strings branch 2 times, most recently from 811033a to 944d303 Compare October 27, 2022 15:47
@akx
Copy link
Contributor Author

akx commented Oct 27, 2022

Hi @MarcoGorelli, thanks for the review. I remade this PR to limit the scope to non-test files – the test failure was caused by flynt's aggressive string concatenation mode changing things it shouldn't have there.

In addition, I took extra care to only mechanically run safe transforms, and then applied other select fixes by hand in the subsequent commits.

The PR now touches 47 files instead of 120, and the changes are all practically 1 line here and there.

@akx akx force-pushed the f-strings branch 2 times, most recently from afc1dea to be7cf22 Compare November 1, 2022 15:10
@akx akx requested a review from MarcoGorelli November 1, 2022 15:10
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for automation, but if a tool is too aggressive and requires manual fixups, that's a bit of a red flag and I don't think we should be using it, sorry. This is still failing various tests

I'd suggest looking at #48855 if you're interested in linting issues

@MarcoGorelli
Copy link
Member

Also, we already have pyupgrade for rewriting f-strings when it's safe to do so

Closing for now then, but thanks for your PR

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

@MarcoGorelli I would ask you to reconsider. This PR isn't really done by an automated tool anymore. I used flynt as a guide, as it were, and checked and improved the changes it suggested by hand.

@MarcoGorelli
Copy link
Member

The issue with adding mostly cosmetic changes without an automated tool is that there's no way to ensure contributors won't new code that goes against these changes, and then we'll be here again in a month's time making the same changes manually

But if there's a case where this makes a performance difference at the macro level, then happy to take that- e.g. the change you mentioned here #49229 (comment)

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

The issue with adding mostly cosmetic changes

These aren't really only cosmetic changes.

without an automated tool is that there's no way to ensure contributors won't new code that goes against these changes

As far as I understand, all PRs to Pandas are reviewed by humans too.
That reviewing human would probably go "Hey, couldn't you use f"^{foo}" instead of "^" + str(foo)" (or similar).

Again, these changes have been made by a human (yours truly), not an automated tool. Would the context of the review be different if I had not originally mentioned a tool?

@MarcoGorelli
Copy link
Member

These kinds of changes should be made by automated tools without requiring manual fixups, so that

  • we can add this to CI
  • we don't need to keep leaving comments in PR reviews about them

If it's done manually, then we'll be doing it manually again in a month's time, and then again, and so on

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

I'm sorry, but can you explain what "these kinds of changes" are?

These changes hadn't, and couldn't have, been made automatically by a tool in CI because no tool currently exists that could do the semantic/dataflow analysis to see whether a concatenation can be safely replaced with an f-string.

Fortunately, humans can do that, and as said in my comment after the first test failures, I manually, by hand, vetted each change recommended by flynt.

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

Ah, darnit, apparently trying to rebase/force-push a closed PR destroys things and this can't be reopened anymore.

Considering the discussion above, would you be open to re-reviewing a PR if I open a new one, @MarcoGorelli? I honestly believe these changes are objectively good and could have been done earlier by humans, but it just hadn't happened.

@MarcoGorelli
Copy link
Member

I just meant mostly cosmetic / stylistic changes - if any of these delivers a measurable performance improvement, then sure, feel free to submit a PR

If there's no performance improvement and it can't be rewritten/checked automatically, then I'm not sure there's much reason to make the change

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

@MarcoGorelli Would https://twitter.com/raymondh/status/1205969258800275456 (found via the https://pylint.pycqa.org/en/latest/user_guide/messages/convention/consider-using-f-string.html doc page) be enough proof that f-strings will deliver performance improvements even in the simple cases?

@MarcoGorelli
Copy link
Member

that tweet shows savings of a few nanoseconds -if there's some pandas operation where concatenating strings compounds and there's a noticeable performance improvement overall, then that's great - your sql.py example might be a good example of this. if it's just an error message that'd only be printed once, then I'd say this isn't worth the churn

@akx
Copy link
Contributor Author

akx commented Nov 2, 2022

Well, to put the tweet in other terms, it shows improvements of 32% to 300% for a simple operation.

What's the downside of "churn" for an improvement and how do Pandas developers quantify when it's worth it or not?

Also, I'd like to point out that this isn't the first PR to just change things to f-strings because it's possible; has something changed since #29547?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants