[REVIEW] Raise duplicate column error in `DataFrame.rename` #10120

galipremsagar · 2022-01-25T17:57:48Z

This PR adds a duplicate column validation check in ColumnAccessor.rename_levels

codecov · 2022-01-25T19:32:21Z

Codecov Report

Merging #10120 (b1dfe93) into branch-22.04 (e24fa8f) will increase coverage by 0.04%.
The diff coverage is 0.00%.

@@               Coverage Diff                @@
##           branch-22.04   #10120      +/-   ##
================================================
+ Coverage         10.37%   10.42%   +0.04%     
================================================
  Files               119      119              
  Lines             20149    20609     +460     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18461     +403

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/io.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_fuzz_testing/orc.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/parquet.py	`0.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/utils.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/_base_index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/column/column.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/decimal.py	`0.00% <0.00%> (ø)`
... and 81 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a552afb...b1dfe93. Read the comment docs.

charlesbluca

One minor question:

charlesbluca · 2022-01-26T20:25:24Z

python/cudf/cudf/core/column_accessor.py

-                new_names = (
+                new_col_names = [
                    mapper.get(col_name, col_name) for col_name in self.keys()
-                )
+                ]


It's not immediately obvious to me - why opt for a list instead of a tuple here?

This doesn't change a tuple to a list, it changes a generator to a list. There's no equivalent to a list comprehension for a tuple because the syntax (x for x in ...) doesn't create a tuple, it creates a generator. The reason this change is necessary is that you cannot query the length of a generator, so the error check below won't work.

Aside: If you're not familiar with generators, they are what you get when you write a function that uses yield instead of return. The purpose of generators is not to materialize the entire set of data at once, but to instead produce the data one at a time. One of the most obvious uses of this is to save memory, but there are also lots of other reasons you might prefer a generator to a list such as wanting to produce an arbitrary number of elements from an infinite sequence. Here's a simple example illustrating the difference:

>>> def f(x): ... print(f"Called f({x})") ... return x**2 >>> for _ in (f(x) for x in range(3)): ... print("Hello") ... Called f(0) Hello Called f(1) Hello Called f(2) Hello >>> for _ in [f(x) for x in range(3)]: ... print("Hello") ... Called f(0) Called f(1) Called f(2) Hello Hello Hello

vyasr · 2022-01-27T01:27:00Z

@gpucibot merge

add duplicate column name validation

b1dfe93

galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jan 25, 2022

galipremsagar requested a review from a team as a code owner January 25, 2022 17:57

galipremsagar self-assigned this Jan 25, 2022

galipremsagar requested review from brandon-b-miller and marlenezw January 25, 2022 17:57

charlesbluca approved these changes Jan 26, 2022

View reviewed changes

vyasr approved these changes Jan 27, 2022

View reviewed changes

rapids-bot bot merged commit dcc0bf5 into rapidsai:branch-22.04 Jan 27, 2022

galipremsagar removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Raise duplicate column error in `DataFrame.rename` #10120

[REVIEW] Raise duplicate column error in `DataFrame.rename` #10120

galipremsagar commented Jan 25, 2022

codecov bot commented Jan 25, 2022 •

edited

Loading

charlesbluca left a comment

charlesbluca Jan 26, 2022

vyasr Jan 27, 2022

vyasr commented Jan 27, 2022

[REVIEW] Raise duplicate column error in DataFrame.rename #10120

[REVIEW] Raise duplicate column error in DataFrame.rename #10120

Conversation

galipremsagar commented Jan 25, 2022

codecov bot commented Jan 25, 2022 • edited Loading

Codecov Report

charlesbluca left a comment

Choose a reason for hiding this comment

charlesbluca Jan 26, 2022

Choose a reason for hiding this comment

vyasr Jan 27, 2022

Choose a reason for hiding this comment

vyasr commented Jan 27, 2022

[REVIEW] Raise duplicate column error in `DataFrame.rename` #10120

[REVIEW] Raise duplicate column error in `DataFrame.rename` #10120

codecov bot commented Jan 25, 2022 •

edited

Loading