Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Raise duplicate column error in DataFrame.rename #10120

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

galipremsagar
Copy link
Contributor

Fixes: #10117

This PR adds a duplicate column validation check in ColumnAccessor.rename_levels

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jan 25, 2022
@galipremsagar galipremsagar requested a review from a team as a code owner January 25, 2022 17:57
@galipremsagar galipremsagar self-assigned this Jan 25, 2022
@codecov
Copy link

codecov bot commented Jan 25, 2022

Codecov Report

Merging #10120 (b1dfe93) into branch-22.04 (e24fa8f) will increase coverage by 0.04%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10120      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.04%     
================================================
  Files               119      119              
  Lines             20149    20609     +460     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18461     +403     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/orc.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/parquet.py 0.00% <ø> (ø)
python/cudf/cudf/_fuzz_testing/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/decimal.py 0.00% <0.00%> (ø)
... and 81 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a552afb...b1dfe93. Read the comment docs.

Copy link
Member

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor question:

Comment on lines -527 to +530
new_names = (
new_col_names = [
mapper.get(col_name, col_name) for col_name in self.keys()
)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious to me - why opt for a list instead of a tuple here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change a tuple to a list, it changes a generator to a list. There's no equivalent to a list comprehension for a tuple because the syntax (x for x in ...) doesn't create a tuple, it creates a generator. The reason this change is necessary is that you cannot query the length of a generator, so the error check below won't work.

Aside: If you're not familiar with generators, they are what you get when you write a function that uses yield instead of return. The purpose of generators is not to materialize the entire set of data at once, but to instead produce the data one at a time. One of the most obvious uses of this is to save memory, but there are also lots of other reasons you might prefer a generator to a list such as wanting to produce an arbitrary number of elements from an infinite sequence. Here's a simple example illustrating the difference:

>>> def f(x):                                                                                                                                                                                     
...     print(f"Called f({x})")                                                                       
...     return x**2
>>> for _ in (f(x) for x in range(3)):
...     print("Hello")                                                                                
... 
Called f(0)
Hello
Called f(1)
Hello
Called f(2)
Hello
>>> for _ in [f(x) for x in range(3)]:                                                                
...     print("Hello")
... 
Called f(0)
Called f(1)
Called f(2)
Hello
Hello
Hello

@vyasr
Copy link
Contributor

vyasr commented Jan 27, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit dcc0bf5 into rapidsai:branch-22.04 Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] DataFrame columns rename. Different behaviour with regards Pandas
3 participants