Support re-partitioning large data multiple times and each time with a different seed. #7911
Labels
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
This is a follow-up of #7794 to support the case when the build side batch is really-really-really big.
Currently we are re-partitioning the data 16 ways by default. If the build side is very large, like 64 GiB, then each build said batch would be about 4 GiB in size. This is larger than our target size so it would likely result in us using too much GPU memory to do the computation.
The idea would be that if we do hit a situation where one or more the build side batches are still too large we would try to re-partition them again, but with a different seed. We also would have seen all of the data so we could guess at a better number of partitions to try the second time. (This is really minor because the number of partitions is configurable, but I would like prefer to have us to not crash even in bad situations).
The text was updated successfully, but these errors were encountered: