[FEA] Session-based recommendation support #355

gabrielspmoreira · 2020-10-15T22:50:50Z

Motivation

Many industry use cases face the user cold-start problem, where the user might either be not logged or might have very few and sparse interactions. Furthermore, for some domains the users preferences might change a lot among its sessions. Session-based recommendation has been a popular approach in industry to deal with the user cold-start, leveraging the sequence of items within the user session to provide contextual recommendation. This is specially relevant for GDPR compliance, as you do not need to use user’s past interactions to provide personalized recommendation.

Requirements:

RQ01 - List Aggregation Sorted by Timestamp

This requirement was extracted to issue #641, with updated specs

RQ02 - Temporal dataset split

This requirement was extracted to issue #642, with updated specs

RQ03 - Export sessions to parquet format

After grouping user interactions into sessions, each row of the exported dataset will be one session. The columns that were aggregated as lists should be exported as array columns in the parquet file.
Note that, as sessions length varies, the array columns could have different length for each row (session).

RQ04 - List column support by NVT DataLoader

Our Data loaders for TF and PyTorch must be able to read the exported parquet files, where each row represents a session, and features are represented as list columns.
P.s. Note that, as sessions length varies, the array columns could have different length for each row (session).

gabrielspmoreira · 2021-03-04T13:35:47Z

Regarding to the RQ04, our PyTorch dataloader already supports reading list columns in parquet files, but not as a first-class citizen.
I have extended the PyTorch dataloader to return a SparseTensor representation from the internal NVT representation of list columns, as described in these issues: #500 and #499
Those improvements should be integrated for the next version of our PyTorch dataloader to better support multi-hot and also session-based / sequence-based recommendation

gabrielspmoreira · 2021-03-08T19:15:29Z

To make it easier to tackle those requirements, we have extracted the RQ1 to #641 and RQ2 to #642, so that they can be developed independently.

benfred · 2021-04-12T23:46:40Z

RQ1,RQ2,RQ3 are all in the v0.5.0. The dataloader support will be in v0.5.1 (#500)

benfred · 2021-06-03T22:02:53Z

RQ04 is handled by #793

This was referenced Mar 8, 2021

[FEA] Sequential / Session-based recommendation and time series support - Group by sorting values by timestamp #641

Closed

[FEA] Partition output parquet files by a column #642

Closed

benfred added the session-based label Mar 9, 2021

benfred added the P0 label May 4, 2021

viswa-nvidia added this to the NVTabular v0.6 milestone May 4, 2021

benfred assigned jperez999 Jun 3, 2021

benfred closed this as completed Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Session-based recommendation support #355

[FEA] Session-based recommendation support #355

gabrielspmoreira commented Oct 15, 2020 •

edited

Loading

gabrielspmoreira commented Mar 4, 2021

gabrielspmoreira commented Mar 8, 2021

benfred commented Apr 12, 2021

benfred commented Jun 3, 2021

[FEA] Session-based recommendation support #355

[FEA] Session-based recommendation support #355

Comments

gabrielspmoreira commented Oct 15, 2020 • edited Loading

Motivation

Requirements:

RQ01 - List Aggregation Sorted by Timestamp

RQ02 - Temporal dataset split

RQ03 - Export sessions to parquet format

RQ04 - List column support by NVT DataLoader

gabrielspmoreira commented Mar 4, 2021

gabrielspmoreira commented Mar 8, 2021

benfred commented Apr 12, 2021

benfred commented Jun 3, 2021

gabrielspmoreira commented Oct 15, 2020 •

edited

Loading