-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Raise duplicate column error in DataFrame.rename
#10120
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10120 +/- ##
================================================
+ Coverage 10.37% 10.42% +0.04%
================================================
Files 119 119
Lines 20149 20609 +460
================================================
+ Hits 2091 2148 +57
- Misses 18058 18461 +403
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor question:
new_names = ( | ||
new_col_names = [ | ||
mapper.get(col_name, col_name) for col_name in self.keys() | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately obvious to me - why opt for a list instead of a tuple here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't change a tuple to a list, it changes a generator to a list. There's no equivalent to a list comprehension for a tuple because the syntax (x for x in ...)
doesn't create a tuple, it creates a generator. The reason this change is necessary is that you cannot query the length of a generator, so the error check below won't work.
Aside: If you're not familiar with generators, they are what you get when you write a function that uses yield
instead of return
. The purpose of generators is not to materialize the entire set of data at once, but to instead produce the data one at a time. One of the most obvious uses of this is to save memory, but there are also lots of other reasons you might prefer a generator to a list such as wanting to produce an arbitrary number of elements from an infinite sequence. Here's a simple example illustrating the difference:
>>> def f(x):
... print(f"Called f({x})")
... return x**2
>>> for _ in (f(x) for x in range(3)):
... print("Hello")
...
Called f(0)
Hello
Called f(1)
Hello
Called f(2)
Hello
>>> for _ in [f(x) for x in range(3)]:
... print("Hello")
...
Called f(0)
Called f(1)
Called f(2)
Hello
Hello
Hello
@gpucibot merge |
Fixes: #10117
This PR adds a duplicate column validation check in
ColumnAccessor.rename_levels