You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many industry use cases face the user cold-start problem, where the user might either be not logged or might have very few and sparse interactions. Furthermore, for some domains the users preferences might change a lot among its sessions. Session-based recommendation has been a popular approach in industry to deal with the user cold-start, leveraging the sequence of items within the user session to provide contextual recommendation. This is specially relevant for GDPR compliance, as you do not need to use user’s past interactions to provide personalized recommendation.
Requirements:
RQ01 - List Aggregation Sorted by Timestamp
This requirement was extracted to issue #641, with updated specs
RQ02 - Temporal dataset split
This requirement was extracted to issue #642, with updated specs
RQ03 - Export sessions to parquet format
After grouping user interactions into sessions, each row of the exported dataset will be one session. The columns that were aggregated as lists should be exported as array columns in the parquet file.
Note that, as sessions length varies, the array columns could have different length for each row (session).
RQ04 - List column support by NVT DataLoader
Our Data loaders for TF and PyTorch must be able to read the exported parquet files, where each row represents a session, and features are represented as list columns.
P.s. Note that, as sessions length varies, the array columns could have different length for each row (session).
The text was updated successfully, but these errors were encountered:
Regarding to the RQ04, our PyTorch dataloader already supports reading list columns in parquet files, but not as a first-class citizen.
I have extended the PyTorch dataloader to return a SparseTensor representation from the internal NVT representation of list columns, as described in these issues: #500 and #499
Those improvements should be integrated for the next version of our PyTorch dataloader to better support multi-hot and also session-based / sequence-based recommendation
Motivation
Many industry use cases face the user cold-start problem, where the user might either be not logged or might have very few and sparse interactions. Furthermore, for some domains the users preferences might change a lot among its sessions. Session-based recommendation has been a popular approach in industry to deal with the user cold-start, leveraging the sequence of items within the user session to provide contextual recommendation. This is specially relevant for GDPR compliance, as you do not need to use user’s past interactions to provide personalized recommendation.
Requirements:
RQ01 - List Aggregation Sorted by Timestamp
This requirement was extracted to issue #641, with updated specs
RQ02 - Temporal dataset split
This requirement was extracted to issue #642, with updated specs
RQ03 - Export sessions to parquet format
After grouping user interactions into sessions, each row of the exported dataset will be one session. The columns that were aggregated as lists should be exported as array columns in the parquet file.
Note that, as sessions length varies, the array columns could have different length for each row (session).
RQ04 - List column support by NVT DataLoader
Our Data loaders for TF and PyTorch must be able to read the exported parquet files, where each row represents a session, and features are represented as list columns.
P.s. Note that, as sessions length varies, the array columns could have different length for each row (session).
The text was updated successfully, but these errors were encountered: