BUG: Fixed pd.unique on array of tuples #16543

TomAugspurger · 2017-05-30T22:05:37Z

jreback

lgtm.

jreback · 2017-05-30T22:48:59Z

doc/source/whatsnew/v0.20.2.txt

@@ -39,7 +39,7 @@ Bug Fixes

 - Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
 - Bug in ``DataFrame.update()`` with ``overwrite=False`` and ``NaN values`` (:issue:`15593`)
-
+- Bug in :func:`pd.unique` on an array of tuples (:issue:`16519`)


I think has to be :func:`unique` ?

You're correct.

jreback · 2017-05-30T22:49:36Z

pandas/tests/test_algos.py

@@ -929,6 +929,22 @@ def test_unique_index(self):
            tm.assert_numpy_array_equal(case.duplicated(),
                                        np.array([False, False, False]))

+    @pytest.mark.parametrize('arr, unique', [
+        ([(0, 0), (0, 1), (1, 0), (1, 1), (0, 0), (0, 1), (1, 0), (1, 1)],


also add an example of this in pd.unique itself.

chris-b1 · 2017-05-30T22:56:58Z

I think you could back out this change from #16434

https://github.com/TomAugspurger/pandas/blob/35da5b9e60993b722760d52dc8dcc0a14129748e/pandas/core/algorithms.py#L391

jreback · 2017-05-30T23:00:46Z

I with agree @chris-b1 comment, yes that looks right.

codecov · 2017-05-30T23:31:11Z

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.79%   90.79%           
=======================================
  Files         161      161           
  Lines       51063    51063           
=======================================
  Hits        46365    46365           
  Misses       4698     4698

Flag	Coverage Δ
#multiple	`88.63% <100%> (ø)`	⬆️
#single	`40.15% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.41% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60dc4c...35da5b9. Read the comment docs.

codecov · 2017-05-30T23:31:15Z

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.79%   90.79%           
=======================================
  Files         161      161           
  Lines       51063    51063           
=======================================
  Hits        46365    46365           
  Misses       4698     4698

Flag	Coverage Δ
#multiple	`88.63% <100%> (ø)`	⬆️
#single	`40.15% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.41% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60dc4c...35da5b9. Read the comment docs.

codecov · 2017-05-30T23:31:29Z

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.75%   90.75%           
=======================================
  Files         161      161           
  Lines       51074    51074           
=======================================
  Hits        46353    46353           
  Misses       4721     4721

Flag	Coverage Δ
#multiple	`88.59% <100%> (ø)`	⬆️
#single	`40.16% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.41% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03d44f3...eb7c18f. Read the comment docs.

TomAugspurger · 2017-05-31T02:12:19Z

I think you could back out this change from #16434

The regression test from #16434 fails if I revert the change. The difference being

(Pdb++) values  # this is with the fix reverted
array([[1, 'a']], dtype=object)
(Pdb++) lib.list_to_object_array(list([(1, 'a')]))  # this is the fix from 16434
array([(1, 'a')], dtype=object)

So an array of lists vs. an array of tuples. Is that correct?

jreback · 2017-05-31T02:15:33Z

you may just need to call
_ensure_arraylike there instead of the isinstance check

TomAugspurger · 2017-05-31T02:36:54Z

Using _ensure_arraylike failed on an empty array pd.Series([1, 2]).isin([]) since it's a float instead of object dtype, so the hashing fails later on. I can handle that case if you want, or just leave the isinstance checks. Not sure which is cleaner really.

jreback · 2017-05-31T10:37:44Z

@TomAugspurger added a commit. should fix up I think.

jreback · 2017-05-31T11:18:09Z

@TomAugspurger looks this broke it. ok just revert my commit and merge your changes. This is a very touchy area. I we are doing the right things in the tests, but just tricky to get exactly right.

Closes pandas-dev#16519

jreback · 2017-05-31T12:00:35Z

@TomAugspurger rebased to remove my commit.

TomAugspurger · 2017-05-31T14:28:17Z

@jreback are you able to restart appveyor jobs? One of the network tests failed on the first job.

jreback · 2017-06-01T10:31:02Z

thanks!

(cherry picked from commit 9d7afa7)

TomAugspurger added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Blocker Blocking issue or pull request for an upcoming release Needs Backport labels May 30, 2017

TomAugspurger added this to the 0.20.2 milestone May 30, 2017

jreback approved these changes May 30, 2017

View reviewed changes

TomAugspurger force-pushed the unique-tuples branch from 35da5b9 to 8871863 Compare May 31, 2017 02:13

BUG: Fixed pd.unique on array of tuples

658f1ab

Closes pandas-dev#16519

jreback force-pushed the unique-tuples branch from 91a0e5f to 0a973df Compare May 31, 2017 12:00

fixup! BUG: Fixed pd.unique on array of tuples

eb7c18f

TomAugspurger force-pushed the unique-tuples branch from 0a973df to eb7c18f Compare May 31, 2017 21:55

jreback merged commit 9d7afa7 into pandas-dev:master Jun 1, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jun 1, 2017

BUG: Fixed pd.unique on array of tuples (pandas-dev#16543)

8da0e93

(cherry picked from commit 9d7afa7)

TomAugspurger added a commit that referenced this pull request Jun 4, 2017

BUG: Fixed pd.unique on array of tuples (#16543)

544fb11

(cherry picked from commit 9d7afa7)

TomAugspurger removed the Needs Backport label Jun 4, 2017

TomAugspurger deleted the unique-tuples branch June 4, 2017 20:29

chris-b1 mentioned this pull request Jun 8, 2017

Series.isin fails (errors) for categoricals #16639

Closed

Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017

BUG: Fixed pd.unique on array of tuples (pandas-dev#16543)

98ed54d

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

BUG: Fixed pd.unique on array of tuples (pandas-dev#16543)

4040016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixed pd.unique on array of tuples #16543

BUG: Fixed pd.unique on array of tuples #16543

TomAugspurger commented May 30, 2017

jreback left a comment

jreback May 30, 2017

TomAugspurger May 31, 2017

jreback May 30, 2017

chris-b1 commented May 30, 2017

jreback commented May 30, 2017

codecov bot commented May 30, 2017

codecov bot commented May 30, 2017 •

edited

Loading

codecov bot commented May 30, 2017 •

edited

Loading

TomAugspurger commented May 31, 2017

jreback commented May 31, 2017

TomAugspurger commented May 31, 2017 •

edited

Loading

jreback commented May 31, 2017

jreback commented May 31, 2017

jreback commented May 31, 2017

TomAugspurger commented May 31, 2017

jreback commented Jun 1, 2017

BUG: Fixed pd.unique on array of tuples #16543

BUG: Fixed pd.unique on array of tuples #16543

Conversation

TomAugspurger commented May 30, 2017

jreback left a comment

Choose a reason for hiding this comment

jreback May 30, 2017

Choose a reason for hiding this comment

TomAugspurger May 31, 2017

Choose a reason for hiding this comment

jreback May 30, 2017

Choose a reason for hiding this comment

chris-b1 commented May 30, 2017

jreback commented May 30, 2017

codecov bot commented May 30, 2017

Codecov Report

codecov bot commented May 30, 2017 • edited Loading

Codecov Report

codecov bot commented May 30, 2017 • edited Loading

Codecov Report

TomAugspurger commented May 31, 2017

jreback commented May 31, 2017

TomAugspurger commented May 31, 2017 • edited Loading

jreback commented May 31, 2017

jreback commented May 31, 2017

jreback commented May 31, 2017

TomAugspurger commented May 31, 2017

jreback commented Jun 1, 2017

codecov bot commented May 30, 2017 •

edited

Loading

codecov bot commented May 30, 2017 •

edited

Loading

TomAugspurger commented May 31, 2017 •

edited

Loading