Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fixed pd.unique on array of tuples #16543

Merged
merged 2 commits into from
Jun 1, 2017

Conversation

TomAugspurger
Copy link
Contributor

Closes #16519

@TomAugspurger TomAugspurger added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Blocker Blocking issue or pull request for an upcoming release Needs Backport labels May 30, 2017
@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 30, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@@ -39,7 +39,7 @@ Bug Fixes

- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`)
- Bug in ``DataFrame.update()`` with ``overwrite=False`` and ``NaN values`` (:issue:`15593`)

- Bug in :func:`pd.unique` on an array of tuples (:issue:`16519`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think has to be :func:`unique` ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct.

@@ -929,6 +929,22 @@ def test_unique_index(self):
tm.assert_numpy_array_equal(case.duplicated(),
np.array([False, False, False]))

@pytest.mark.parametrize('arr, unique', [
([(0, 0), (0, 1), (1, 0), (1, 1), (0, 0), (0, 1), (1, 0), (1, 1)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add an example of this in pd.unique itself.

@chris-b1
Copy link
Contributor

@jreback
Copy link
Contributor

jreback commented May 30, 2017

I with agree @chris-b1 comment, yes that looks right.

@codecov
Copy link

codecov bot commented May 30, 2017

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.79%   90.79%           
=======================================
  Files         161      161           
  Lines       51063    51063           
=======================================
  Hits        46365    46365           
  Misses       4698     4698
Flag Coverage Δ
#multiple 88.63% <100%> (ø) ⬆️
#single 40.15% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.41% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60dc4c...35da5b9. Read the comment docs.

@codecov
Copy link

codecov bot commented May 30, 2017

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.79%   90.79%           
=======================================
  Files         161      161           
  Lines       51063    51063           
=======================================
  Hits        46365    46365           
  Misses       4698     4698
Flag Coverage Δ
#multiple 88.63% <100%> (ø) ⬆️
#single 40.15% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.41% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e60dc4c...35da5b9. Read the comment docs.

@codecov
Copy link

codecov bot commented May 30, 2017

Codecov Report

Merging #16543 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #16543   +/-   ##
=======================================
  Coverage   90.75%   90.75%           
=======================================
  Files         161      161           
  Lines       51074    51074           
=======================================
  Hits        46353    46353           
  Misses       4721     4721
Flag Coverage Δ
#multiple 88.59% <100%> (ø) ⬆️
#single 40.16% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.41% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03d44f3...eb7c18f. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

I think you could back out this change from #16434

The regression test from #16434 fails if I revert the change. The difference being

(Pdb++) values  # this is with the fix reverted
array([[1, 'a']], dtype=object)
(Pdb++) lib.list_to_object_array(list([(1, 'a')]))  # this is the fix from 16434
array([(1, 'a')], dtype=object)

So an array of lists vs. an array of tuples. Is that correct?

@jreback
Copy link
Contributor

jreback commented May 31, 2017

you may just need to call
_ensure_arraylike there instead of the isinstance check

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented May 31, 2017

Using _ensure_arraylike failed on an empty array pd.Series([1, 2]).isin([]) since it's a float instead of object dtype, so the hashing fails later on. I can handle that case if you want, or just leave the isinstance checks. Not sure which is cleaner really.

@jreback
Copy link
Contributor

jreback commented May 31, 2017

@TomAugspurger added a commit. should fix up I think.

@jreback
Copy link
Contributor

jreback commented May 31, 2017

@TomAugspurger looks this broke it. ok just revert my commit and merge your changes. This is a very touchy area. I we are doing the right things in the tests, but just tricky to get exactly right.

@jreback
Copy link
Contributor

jreback commented May 31, 2017

@TomAugspurger rebased to remove my commit.

@TomAugspurger
Copy link
Contributor Author

@jreback are you able to restart appveyor jobs? One of the network tests failed on the first job.

@jreback jreback merged commit 9d7afa7 into pandas-dev:master Jun 1, 2017
@jreback
Copy link
Contributor

jreback commented Jun 1, 2017

thanks!

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jun 1, 2017
TomAugspurger added a commit that referenced this pull request Jun 4, 2017
@TomAugspurger TomAugspurger deleted the unique-tuples branch June 4, 2017 20:29
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Blocker Blocking issue or pull request for an upcoming release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants