CLN: use f-strings where possible #49229

akx · 2022-10-21T13:28:04Z

Tests added and passed if fixing a bug or adding a new feature
- No new tests, behavior hasn't changed (except possibly become a bit faster)
All code checks passed.
Added type annotations to new arguments/methods/functions.
- Nothing new.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
- No bug or feature, just cleaning.

This PR replaces a bunch of string concatenations and other non-modern formatting operations with f-strings.

The first commit was done mechanically with flynt, followed up by further manual touch-ups that flynt wasn't smart enough to touch.

This does not touch a bunch of concatenations in pandas/io/stata.py, since those are covered by cln commits in #49228.

pandas/_version.py

jbrockmendel · 2022-10-21T16:23:33Z

pandas/core/arrays/string_arrow.py

@@ -345,14 +345,14 @@ def _str_match(
        self, pat: str, case: bool = True, flags: int = 0, na: Scalar | None = None
    ):
        if not pat.startswith("^"):
-            pat = "^" + pat


i dont think these are an improvement. stick to places that currently use .format

Would this apply to all string concatenation? I think e.g. the changes in sql.py (https://github.com/pandas-dev/pandas/pull/49229/files#diff-9268174bfb15f08ef2267375665a85fecf201999902542f6fc9c0d3fadfb4553 if GitHub feels like linking correctly) read much better as an f-string, for one?

EDIT: I would also like to point out that f-strings can be quite a lot faster than string concatenation, and it could easily compound in a library like Pandas:

Benchmark 1: python3 -S ex1.py Time (mean ± σ): 2.516 s ± 0.024 s [User: 2.433 s, System: 0.011 s] Range (min … max): 2.484 s … 2.573 s 10 runs Benchmark 2: python3 -S ex2.py Time (mean ± σ): 2.050 s ± 0.064 s [User: 1.970 s, System: 0.012 s] Range (min … max): 1.967 s … 2.194 s 10 runs Summary 'python3 -S ex2.py' ran 1.23 ± 0.04 times faster than 'python3 -S ex1.py'

where ex1 is timing lambda: "^" + pat + "$" and ex2 is timing lambda: f"^{pat}$".

MarcoGorelli

Thanks for your PR

There's some failing tests, if it's not an automated check that gets it right the first time (rather than one that requires manual edits) then I think we should pass on this one, 120 files are too many to review manually

I remember reading that flynt's rewrites were sometimes unsafe, let's not risk it

akx · 2022-10-27T15:50:18Z

Hi @MarcoGorelli, thanks for the review. I remade this PR to limit the scope to non-test files – the test failure was caused by flynt's aggressive string concatenation mode changing things it shouldn't have there.

In addition, I took extra care to only mechanically run safe transforms, and then applied other select fixes by hand in the subsequent commits.

The PR now touches 47 files instead of 120, and the changes are all practically 1 line here and there.

This was done by hand.

MarcoGorelli

I'm all for automation, but if a tool is too aggressive and requires manual fixups, that's a bit of a red flag and I don't think we should be using it, sorry. This is still failing various tests

I'd suggest looking at #48855 if you're interested in linting issues

MarcoGorelli · 2022-11-02T08:47:15Z

Also, we already have pyupgrade for rewriting f-strings when it's safe to do so

Closing for now then, but thanks for your PR

akx · 2022-11-02T10:19:29Z

@MarcoGorelli I would ask you to reconsider. This PR isn't really done by an automated tool anymore. I used flynt as a guide, as it were, and checked and improved the changes it suggested by hand.

MarcoGorelli · 2022-11-02T10:36:27Z

The issue with adding mostly cosmetic changes without an automated tool is that there's no way to ensure contributors won't new code that goes against these changes, and then we'll be here again in a month's time making the same changes manually

But if there's a case where this makes a performance difference at the macro level, then happy to take that- e.g. the change you mentioned here #49229 (comment)

akx · 2022-11-02T10:41:21Z

The issue with adding mostly cosmetic changes

These aren't really only cosmetic changes.

without an automated tool is that there's no way to ensure contributors won't new code that goes against these changes

As far as I understand, all PRs to Pandas are reviewed by humans too.
That reviewing human would probably go "Hey, couldn't you use f"^{foo}" instead of "^" + str(foo)" (or similar).

Again, these changes have been made by a human (yours truly), not an automated tool. Would the context of the review be different if I had not originally mentioned a tool?

MarcoGorelli · 2022-11-02T10:50:41Z

These kinds of changes should be made by automated tools without requiring manual fixups, so that

we can add this to CI
we don't need to keep leaving comments in PR reviews about them

If it's done manually, then we'll be doing it manually again in a month's time, and then again, and so on

akx · 2022-11-02T10:53:43Z

I'm sorry, but can you explain what "these kinds of changes" are?

These changes hadn't, and couldn't have, been made automatically by a tool in CI because no tool currently exists that could do the semantic/dataflow analysis to see whether a concatenation can be safely replaced with an f-string.

Fortunately, humans can do that, and as said in my comment after the first test failures, I manually, by hand, vetted each change recommended by flynt.

akx · 2022-11-02T10:58:47Z

Ah, darnit, apparently trying to rebase/force-push a closed PR destroys things and this can't be reopened anymore.

Considering the discussion above, would you be open to re-reviewing a PR if I open a new one, @MarcoGorelli? I honestly believe these changes are objectively good and could have been done earlier by humans, but it just hadn't happened.

MarcoGorelli · 2022-11-02T12:31:41Z

I just meant mostly cosmetic / stylistic changes - if any of these delivers a measurable performance improvement, then sure, feel free to submit a PR

If there's no performance improvement and it can't be rewritten/checked automatically, then I'm not sure there's much reason to make the change

akx · 2022-11-02T13:06:23Z

@MarcoGorelli Would https://twitter.com/raymondh/status/1205969258800275456 (found via the https://pylint.pycqa.org/en/latest/user_guide/messages/convention/consider-using-f-string.html doc page) be enough proof that f-strings will deliver performance improvements even in the simple cases?

MarcoGorelli · 2022-11-02T13:13:46Z

that tweet shows savings of a few nanoseconds -if there's some pandas operation where concatenating strings compounds and there's a noticeable performance improvement overall, then that's great - your sql.py example might be a good example of this. if it's just an error message that'd only be printed once, then I'd say this isn't worth the churn

akx · 2022-11-02T13:57:42Z

Well, to put the tweet in other terms, it shows improvements of 32% to 300% for a simple operation.

What's the downside of "churn" for an improvement and how do Pandas developers quantify when it's worth it or not?

Also, I'd like to point out that this isn't the first PR to just change things to f-strings because it's possible; has something changed since #29547?

akx force-pushed the f-strings branch from 56f2933 to cc7aa1f Compare October 21, 2022 14:03

jbrockmendel reviewed Oct 21, 2022

View reviewed changes

pandas/_version.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Oct 21, 2022

View reviewed changes

mroeschke added the Clean label Oct 21, 2022

akx force-pushed the f-strings branch 2 times, most recently from 4884ca1 to b1bb71a Compare October 22, 2022 13:55

MarcoGorelli requested changes Oct 27, 2022

View reviewed changes

akx force-pushed the f-strings branch 2 times, most recently from 811033a to 944d303 Compare October 27, 2022 15:47

akx force-pushed the f-strings branch 2 times, most recently from afc1dea to be7cf22 Compare November 1, 2022 15:10

akx requested a review from MarcoGorelli November 1, 2022 15:10

akx added 5 commits November 2, 2022 07:13

CLN: run flynt to use f-strings (mechanical)

fef8350

CLN: apply manual f-string fixes inspired by flynt -a

9108900

CLN: apply manual f-string fixes inspired by flynt -tc

ea3c120

CLN: replace variations of "..." + str(...) with f-strings

a84ff3a

This was done by hand.

CLN: replace joins on static arguments with f-strings

2e91643

akx force-pushed the f-strings branch from be7cf22 to 2e91643 Compare November 2, 2022 05:13

MarcoGorelli requested changes Nov 2, 2022

View reviewed changes

MarcoGorelli closed this Nov 2, 2022

This was referenced Nov 4, 2022

STYLE: fix pylint consider-using-f-string issues #49515

Merged

STYLE/PERF: use f-strings instead of ''.join on static elements #49517

Merged

STYLE/PERF: replace string concatenations with f-strings in core #49518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: use f-strings where possible #49229

CLN: use f-strings where possible #49229

akx commented Oct 21, 2022 •

edited

Loading

jbrockmendel Oct 21, 2022

akx Oct 22, 2022 •

edited

Loading

MarcoGorelli left a comment

akx commented Oct 27, 2022 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022 •

edited

Loading

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022 •

edited

Loading

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

CLN: use f-strings where possible #49229

CLN: use f-strings where possible #49229

Conversation

akx commented Oct 21, 2022 • edited Loading

jbrockmendel Oct 21, 2022

Choose a reason for hiding this comment

akx Oct 22, 2022 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

akx commented Oct 27, 2022 • edited Loading

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022 • edited Loading

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022 • edited Loading

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

MarcoGorelli commented Nov 2, 2022

akx commented Nov 2, 2022

akx commented Oct 21, 2022 •

edited

Loading

akx Oct 22, 2022 •

edited

Loading

akx commented Oct 27, 2022 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

akx commented Nov 2, 2022 •

edited

Loading

akx commented Nov 2, 2022 •

edited

Loading