-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenate dictionary of objects along axis=1 #15623
Concatenate dictionary of objects along axis=1 #15623
Conversation
Original change here rapidsai#3188 Why were we casting to "float64" in the old testcase? Maybe related to this comment? rapidsai#3188 (comment)
to the reviewer(s): please take a look here 0736e4e#diff-1d997d95893af6a665d5803d179f472c673a7d4e1a0a04a305bd2f1c4a66d957L237 any idea why we were casting to |
/okay to test |
My best guess is that this a historical artifact and is no longer necessary. I think your change to remove that cast is the right move -- thanks! I triggered CI and we can see if it passes tests. If it fails, we can investigate further. |
it weirds me out that the test has been passing all this time, but i'm a noob around these parts so i trust you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resurrecting this! This code is really fiddly, so great job unpicking things. I have a few suggestions and questions for cleanup.
/ok to test |
3e8bbc9
to
a55ca67
Compare
/ok to test |
Sorry, I made a sequence of mess-ups, but hopefully we've got there. |
/ok to test |
@wence- do we prefer
you can test the execution time by setting a breakpoint on line 427 and running this: before = time.perf_counter()
set().union(*(map(type, obj._data.keys()) for obj in objs))
after_changed = time.perf_counter() - before
b = time.perf_counter()
{type(name) for o in objs for name in o._data.keys()}
a_changed = time.perf_counter() - b |
Ah sorry. Too much time writing Haskell. Please go back to your approach. |
sure, making a commit now and merging in latest changes |
/ok to test |
@pytest.mark.parametrize( | ||
"d", | ||
[ | ||
{"first": cudf.DataFrame({"A": [1, 2], "B": [3, 4]})}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best if we can avoid creating instances of GPU objects (cudf.DataFrame
) in the parametrize arguments. Those are executed at test collection time rather than at test runtime, and make the test suite slow to launch due to a large number of small host-device copies. Let's defer construction using a pattern like this:
@pytest.mark.parametrize(
"d",
[
{"first": {"A": [1, 2], "B": [3, 4]}},
# ...
],
)
def test_concat_dictionary(d, axis):
# Convert dict-of-dicts to dict-of-DataFrames to avoid raw GPU objects in the parameters
d = {k: cudf.DataFrame(v) for k, v in input.items()}
result = cudf.concat(d, axis=axis)
expected = cudf.from_pandas(
pd.concat({k: df.to_pandas() for k, df in d.items()}, axis=axis)
)
assert_eq(expected, result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do! let me know if you'd like me to clean up the other tests in this file in the same manner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i passed in the reference to each class for the tests, let me know if this causes weirdness during test collection and i'll make a simple map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s totally fine. No need to refactor the rest of the file in this PR. A separate PR would be welcome if you’re interested.
e9c7239
to
b5b9116
Compare
/ok to test |
/merge |
Thanks @er-eis! I think you mentioned you had a follow-up PR planned? Please feel free to open an issue documenting any next steps that are needed, even if you don't have time to contribute a PR. |
Description
Note: This work is heavily based off amanlai's PR raised here, wasn't able to base my branch off amanlai's due to deleted branch.
Checklist
^ need me to create an entry in the
CHANGELOG.md
?