[FEA] Support SplitAndRetry for GpuFastSampleExec #8313
Labels
feature request
New feature or request
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
Is your feature request related to a problem? Please describe.
GpuFastSampleExec
is a much faster, but not 100% Spark Compatible implementation ofGpuSamleExec
.We should add in SplitAndRetry support to it. It is off by default so we should probably have this be lower priority.
Technically the current implementation is not deterministic when it comes to how the data is batches, but none of the sampling implementations are agnostic to the order of the rows and most of the time the order of the rows is not guaranteed, so it is probably good enough. With that context we probably can just put the split/retry blocks around the main part of the code. It should mostly do the right thing. If we really want to we could also checkpoint/restore the index that is used to help generate the seed for the filtering.
The text was updated successfully, but these errors were encountered: