You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
For a realistic temporal evaluation protocol, it is common to split users interactions/sessions by time windows (e.g. hours, days, weeks, months), so that models can be trained over a time window and evaluated for a future time window.
So, NVTabular should be able to export the parquet files partitioned by a column (which can be a feature extracted from a timestamp column by a LambdaOp, like hour, day, week or month)
Describe the solution you'd like
When the workflow exports the parquet files, they should be organized in folders named after the partition column values (e.g. "interaction_date=2021-03-05").
That is, each folder should contain parquet files whose rows have the same value for the partition column as in their folder name
Is your feature request related to a problem? Please describe.
For a realistic temporal evaluation protocol, it is common to split users interactions/sessions by time windows (e.g. hours, days, weeks, months), so that models can be trained over a time window and evaluated for a future time window.
So, NVTabular should be able to export the parquet files partitioned by a column (which can be a feature extracted from a timestamp column by a
LambdaOp
, like hour, day, week or month)Describe the solution you'd like
When the workflow exports the parquet files, they should be organized in folders named after the partition column values (e.g. "interaction_date=2021-03-05").
That is, each folder should contain parquet files whose rows have the same value for the partition column as in their folder name
Additional context
With PySpark, this can be accomplished as follows
P.s. This issue was extracted from #355 , which is broader in scope, so that it is implemented independently.
The text was updated successfully, but these errors were encountered: