Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Session-based recommendation support #355

Closed
gabrielspmoreira opened this issue Oct 15, 2020 · 4 comments
Closed

[FEA] Session-based recommendation support #355

gabrielspmoreira opened this issue Oct 15, 2020 · 4 comments
Assignees

Comments

@gabrielspmoreira
Copy link
Member

gabrielspmoreira commented Oct 15, 2020

Motivation

Many industry use cases face the user cold-start problem, where the user might either be not logged or might have very few and sparse interactions. Furthermore, for some domains the users preferences might change a lot among its sessions. Session-based recommendation has been a popular approach in industry to deal with the user cold-start, leveraging the sequence of items within the user session to provide contextual recommendation. This is specially relevant for GDPR compliance, as you do not need to use user’s past interactions to provide personalized recommendation.

Requirements:

RQ01 - List Aggregation Sorted by Timestamp

This requirement was extracted to issue #641, with updated specs

RQ02 - Temporal dataset split

This requirement was extracted to issue #642, with updated specs

RQ03 - Export sessions to parquet format

After grouping user interactions into sessions, each row of the exported dataset will be one session. The columns that were aggregated as lists should be exported as array columns in the parquet file.
Note that, as sessions length varies, the array columns could have different length for each row (session).

RQ04 - List column support by NVT DataLoader

Our Data loaders for TF and PyTorch must be able to read the exported parquet files, where each row represents a session, and features are represented as list columns.
P.s. Note that, as sessions length varies, the array columns could have different length for each row (session).

@gabrielspmoreira
Copy link
Member Author

Regarding to the RQ04, our PyTorch dataloader already supports reading list columns in parquet files, but not as a first-class citizen.
I have extended the PyTorch dataloader to return a SparseTensor representation from the internal NVT representation of list columns, as described in these issues: #500 and #499
Those improvements should be integrated for the next version of our PyTorch dataloader to better support multi-hot and also session-based / sequence-based recommendation

@gabrielspmoreira
Copy link
Member Author

To make it easier to tackle those requirements, we have extracted the RQ1 to #641 and RQ2 to #642, so that they can be developed independently.

@benfred
Copy link
Member

benfred commented Apr 12, 2021

RQ1,RQ2,RQ3 are all in the v0.5.0. The dataloader support will be in v0.5.1 (#500)

@benfred benfred added the P0 label May 4, 2021
@viswa-nvidia viswa-nvidia added this to the NVTabular v0.6 milestone May 4, 2021
@benfred
Copy link
Member

benfred commented Jun 3, 2021

RQ04 is handled by #793

@benfred benfred closed this as completed Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants