Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add Series.diff() via Numba kernel #1456

Merged
merged 10 commits into from
May 3, 2019
Merged

[REVIEW] Add Series.diff() via Numba kernel #1456

merged 10 commits into from
May 3, 2019

Conversation

beckernick
Copy link
Member

@beckernick beckernick commented Apr 18, 2019

This PR adds the diff functionality to numeric cudf.Series, analogous to pandas.

Summary of Changes

  • Adds a diff method to cudf.Series, matching pandas API
  • Adds a gpu_diff numba kernel in cudautils for out-of-place differencing
  • Tests for forward and backward differencing of numeric dtypes
  • Cell values that would become NaN based on the period in pandas diff are currently set to be -1 in cudf diff
  • Requires that a column not contain any null values

This could serve as a stopgap until diff can be implemented in libcudf (#1271 ), as the performance is similar to standard binary operations on columns of millions of rows.

import cudf
import numpy as np

nelem = int(1e7)
df = cudf.DataFrame({'a':np.random.sample(nelem)})

%timeit df.a.diff(2)
%timeit df.a + 1
4.33 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.47 ms ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@beckernick beckernick changed the title [WIP] Add Series.diff() via Numba kernel [REVIEW] Add Series.diff() via Numba kernel Apr 18, 2019
@beckernick beckernick added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. labels Apr 19, 2019
@beckernick beckernick requested a review from a team as a code owner May 2, 2019 14:27
@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF Reviewer labels May 3, 2019
@kkraus14 kkraus14 merged commit 7f3d5fe into rapidsai:branch-0.7 May 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants