[FEA] Add window functions #740

karlhigley · 2021-04-15T16:22:54Z

No description provided.

benfred · 2021-05-04T17:33:20Z

We have first and last support already in v0.5 -

rjzamora · 2021-05-04T17:50:21Z

As in #734, the list.take method could possibly be used for a first-order solution if something beyond first/last is needed.

gabrielspmoreira · 2021-05-04T18:17:50Z

Indeed, we have got the aggregation functions we needed for session-based recommendation within this closed issue #641, which introduced the nvt.ops.Groupby() op that aggregates interactions by a column (e.g. session or user id), sorts the interactions by another column (e.g., timestamp), and then provide either a "list", of the "first" or "last" element in the list.

gabrielspmoreira · 2021-05-04T18:17:55Z

For session-based recommendation, when the session id is not provided in the dataset, we use the idle time between user interactions to split the sessions (usually maximum of 30 min between two consecutive interactions within a session). I understand that I could use the ops.DifferenceLag() partitioned by userid to get the elapsed time between user interactions timestamp. But I am not sure how could I use this new "delta time" feature to generate the same session id for interactions with lower delta time, or to split the sessions in lists as we use the nvt.ops.Groupby(). I don't know if this use case would fit an aggregation or window function, if not I can open a separate issue for this one.

gabrielspmoreira · 2021-05-04T18:47:55Z

Regarding the window functions (not specific to session-based recommendation) it is a common feature engineering practice to use lead and lag features for time series and recommender systems in general.
We have the ops.DifferenceLag() op to compute the difference between the current value and the previous value of a feature for a user. But it would be very useful to have 'Lag()' and Lead() ops, which return the actual "past" and "future values" for a given feature, partitioned by a column (e.g. user), which is possible with the usage of the shift(1) or shift(-1) with Pandas partitioned by a column (e.g. user). This sliding window feature is a FEA on the cuDF repo, which is being addressed by this PR, so hopefully that will make it easier to integrated it in NVTabular.

As an example, KGMON has used this feature in the Booking.com challenge to have for each training row the last 5 cities in a sequence (e.g. shift(5), shift(4), shift(3, shift(2), shift(1), partitioned by trip), like in this example on cuDF

def shift_feature(df, groupby_col, col, offset, nan=-1, colname=''):
    df[colname] = df[col].shift(offset)
    df.loc[df[groupby_col]!=df[groupby_col].shift(offset), colname] = nan

shift_feature(raw, 'utrip_id_', 'city_id_', 1, NUM_CITIES, f'city_id_lag{1}')
shift_feature(raw, 'utrip_id_', 'city_id_', 2, NUM_CITIES, f'city_id_lag{2}')
...

I have used this feature using cuDF to remove consecutive repeated user interactions in the same item, as in the following example:

# Sorts the dataframe by session and timestamp, to remove consecutive repetitions
interactions_df = interactions_df.sort_values(['session_id', 'timestamp'])
interactions_df['item_id_past'] = interactions_df['item_id'].shift(1)
interactions_df['session_id_past'] = interactions_df['session_id'].shift(1)
#Keeping only no consectutive repeated in session interactions
interactions_df = interactions_df[~((interactions_df['session_id'] == interactions_df['session_id_past']) & \
                 (interactions_df['item_id'] == interactions_df['item_id_past']))]

In both cases, we did a hack on cuDF compared to the shift() available in Pandas, which supports partitioning by column as in the example of this FEA

karlhigley added the session-based label Apr 15, 2021

viswa-nvidia added this to the NVTabular v0.6 milestone Apr 26, 2021

benfred added the P0 label May 4, 2021

benfred changed the title ~~Add window functions for session-based recs~~ [FEA] Add window functions for session-based recs May 4, 2021

benfred added P1 and removed P0 labels Jun 7, 2021

karlhigley changed the title ~~[FEA] Add window functions for session-based recs~~ [FEA] Add window functions Jun 7, 2021

karlhigley added enhancement New feature or request and removed session-based labels Jun 7, 2021

karlhigley removed this from the NVTabular v0.6 milestone Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add window functions #740

[FEA] Add window functions #740

karlhigley commented Apr 15, 2021

benfred commented May 4, 2021

rjzamora commented May 4, 2021

gabrielspmoreira commented May 4, 2021

gabrielspmoreira commented May 4, 2021 •

edited

Loading

gabrielspmoreira commented May 4, 2021 •

edited

Loading

[FEA] Add window functions #740

[FEA] Add window functions #740

Comments

karlhigley commented Apr 15, 2021

benfred commented May 4, 2021

rjzamora commented May 4, 2021

gabrielspmoreira commented May 4, 2021

gabrielspmoreira commented May 4, 2021 • edited Loading

gabrielspmoreira commented May 4, 2021 • edited Loading

gabrielspmoreira commented May 4, 2021 •

edited

Loading

gabrielspmoreira commented May 4, 2021 •

edited

Loading