Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1406 Fix bug in ak.lookup #1407

Merged
merged 2 commits into from
May 19, 2022
Merged

1406 Fix bug in ak.lookup #1407

merged 2 commits into from
May 19, 2022

Conversation

reuster986
Copy link
Collaborator

Closes #1406

Previously, the ak.lookup() function, which treats key-value pairs as a function to be applied over args, implicitly assumed that the user-supplied keys were in the order output by ak.unique() and ak.GroupBy(). This is the most common usage, but it is not inherent to the semantics of the function, and when a user (legitimately) supplied keys that were unique but unordered, ak.lookup() would return the wrong answer.

This PR fixes that behavior by performing ak.GroupBy(keys) to ensure the anticipated ordering. Because this can be an expensive step, it also adds a keys_from_unique keyword arg as a shortcut for when keys come from unique/GroupBy.

This PR also adds a unit test for ak.lookup in the join_test.py file (because lookup is a special case of a join).

Copy link
Contributor

@mhmerrill mhmerrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

@mhmerrill mhmerrill merged commit 6122a56 into master May 19, 2022
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about more informative testing.

I checked and couldn't find anything, but I also wanted to make sure that there weren't any places where we use lookup that should be updated to set the keys_from_unique flag

# Simple lookup with int keys
# Also test shortcut for unique-ordered keys
res = ak.lookup(keys, values, args, fillvalue=-1, keys_from_unique=True)
self.assertTrue((res.to_ndarray() == ans).all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not consistent across our tests but i prefer

self.assertListEqual(res.to_ndarray().tolist(), ans.tolist())

because it gives more precise information about where the lists are different

for example

>       self.assertTrue((ak.array([1,2,3]) == ak.array([1,1,3])).all())
E       AssertionError: False is not true

vs

>       self.assertListEqual(ak.array([1,2,3]).to_ndarray().tolist(), ak.array([1,1,3]).to_ndarray().tolist())
E       AssertionError: Lists differ: [1, 2, 3] != [1, 1, 3]
E       
E       First differing element 1:
E       2
E       1
E       
E       - [1, 2, 3]
E       ?     ^
E       
E       + [1, 1, 3]
E       ?     ^

@stress-tess stress-tess deleted the fix-lookup branch October 12, 2022 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.lookup gives incorrect result when keys are not ordered
4 participants