-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Time series resampling (df.resample) #8416
Labels
Milestone
Comments
beckernick
added
feature request
New feature or request
Needs Triage
Need team to review and classify
labels
Jun 1, 2021
shwina
added
Python
Affects Python cuDF API.
and removed
Needs Triage
Need team to review and classify
labels
Jun 1, 2021
There has been some related discussion in #6255. |
No, Spark does not have built-in resampling like this. |
rapids-bot bot
pushed a commit
that referenced
this issue
Nov 13, 2021
Closes #6255, #8416 This PR implements two related features: 1. Grouping by a frequency via the `freq=` argument to `cudf.Grouper` 2. and time-series resampling via the `.resample()` API Either operation results in ` _Resampler` object that represents the data resampled into "bins" of a particular frequency. The following operations are supported on resampled data: 1. Aggregations such as `min()` and `max()`, performed bin-wise 2. `ffill()` and `bfill()` methods: forward and backward filling in the case of upsampling data 3. `asfreq()`: returns the resampled data as a Series or DataFrame() These are all best understood by example: First, we create a time series with 1 minute intervals: ```python >>> index = cudf.date_range(start="2001-01-01", periods=10, freq="1T") >>> sr = cudf.Series(range(10), index=index) >>> sr 2001-01-01 00:00:00 0 2001-01-01 00:01:00 1 2001-01-01 00:02:00 2 2001-01-01 00:03:00 3 2001-01-01 00:04:00 4 2001-01-01 00:05:00 5 2001-01-01 00:06:00 6 2001-01-01 00:07:00 7 2001-01-01 00:08:00 8 2001-01-01 00:09:00 9 dtype: int64 ```` Downsampling to 3 minute intervals, followed by a "sum" aggregation: ```python >>> sr.resample("3T").sum() # equivalently, sr.groupby(cudf.Grouper(freq="3T")).sum() 2001-01-01 00:00:00 3 2001-01-01 00:03:00 12 2001-01-01 00:06:00 21 2001-01-01 00:09:00 9 dtype: int64 ```` Upsampling to 30 second intervals: ```python >>> sr.resample("30s").asfreq() 2001-01-01 00:00:00 0.0 2001-01-01 00:00:30 NaN 2001-01-01 00:01:00 1.0 2001-01-01 00:01:30 NaN 2001-01-01 00:02:00 2.0 2001-01-01 00:02:30 NaN 2001-01-01 00:03:00 3.0 2001-01-01 00:03:30 NaN 2001-01-01 00:04:00 4.0 2001-01-01 00:04:30 NaN 2001-01-01 00:05:00 5.0 2001-01-01 00:05:30 NaN 2001-01-01 00:06:00 6.0 2001-01-01 00:06:30 NaN 2001-01-01 00:07:00 7.0 2001-01-01 00:07:30 NaN 2001-01-01 00:08:00 8.0 2001-01-01 00:08:30 NaN 2001-01-01 00:09:00 9.0 Freq: 30S, dtype: float64 ``` Upsampling to 30 second intervals, followed by a forward fill: ```python >>> sr.resample("30s").ffill() 2001-01-01 00:00:00 0 2001-01-01 00:00:30 0 2001-01-01 00:01:00 1 2001-01-01 00:01:30 1 2001-01-01 00:02:00 2 2001-01-01 00:02:30 2 2001-01-01 00:03:00 3 2001-01-01 00:03:30 3 2001-01-01 00:04:00 4 2001-01-01 00:04:30 4 2001-01-01 00:05:00 5 2001-01-01 00:05:30 5 2001-01-01 00:06:00 6 2001-01-01 00:06:30 6 2001-01-01 00:07:00 7 2001-01-01 00:07:30 7 2001-01-01 00:08:00 8 2001-01-01 00:08:30 8 2001-01-01 00:09:00 9 Freq: 30S, dtype: int64 ``` Authors: - Ashwin Srinath (https://github.com/shwina) - Michael Wang (https://github.com/isVoid) Approvers: - https://github.com/brandon-b-miller - Vyas Ramasubramani (https://github.com/vyasr) - Benjamin Zaitlen (https://github.com/quasiben) URL: #9178
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In pandas, I can "resample" time series data (converting the frequency or up/downsampling the data while keeping track of the associated values) with the convenient resample API. This feature request comes care of this stackoverflow post.
In the following example, per-minute data is aggregated into three minute bins and the associated values are summed.
The text was updated successfully, but these errors were encountered: