group by / create batch of max x rows function #46

chris-twiner · 2023-08-14T06:25:05Z

per pyspark group and batch of x rows and another I can't find (probably deleted) that wanted api calls bucketed it seems there is a gap for partition id with a maxed counter, so stateful with an increment every x rows, and reset count on new partition.

This would allow chunking but ideally the chunks would be mappable, so perhaps combined with a collect_set and a custom udf.

chris-twiner · 2023-08-16T09:45:06Z

another one https://stackoverflow.com/questions/76908648/pass-multiple-rows-to-function-using-spark-udf

chris-twiner added the low priority label Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group by / create batch of max x rows function #46

group by / create batch of max x rows function #46

chris-twiner commented Aug 14, 2023

chris-twiner commented Aug 16, 2023

group by / create batch of max x rows function #46

group by / create batch of max x rows function #46

Comments

chris-twiner commented Aug 14, 2023

chris-twiner commented Aug 16, 2023