[CPU] ML Portion for GPU-BDB Queries #248

VibhuJawa · 2022-03-16T18:36:32Z

Below queries rely on cuML models from for ML GPU . Depending on the performance we need to decide b/w Distributed (dask-ml) vs non distributed (sklearn) implementation for the ML portion of these queries. I suggest benchmarking both and then choosing the one that gives the best performance.

Query-05 GPU:cuml.LogisticRegression

Non Distributed CPU: sklearn.linear_model.LogisticRegression.LogisticRegression
Distributed CPU: dask_ml.linear_model.LogisticRegression

Query-20 GPU: cuml.cluster.kmeans

CPU: sklearn.cluster.KMeans
Distributed CPU: dask_ml.cluster.Kmeans

Query-25 GPU: cuml.cluster.kmeans

CPU: sklearn.cluster.KMeans
Distributed CPU: dask_ml.cluster.Kmeans

Query-26 GPU: cuml.cluster.kmeans

CPU: sklearn.cluster.KMeans
Distributed CPU: dask_ml.cluster.Kmeans

Query 28 GPUcuml.dask.naive_bayes

Distributed CPU CPU Equivalent dask_ml.naive_bayes

CC: @DaceT , @randerzander

Related PRS:

#243

#244

ChrisJar · 2022-05-03T17:27:45Z

For query 5 it appears that using sklearn as a direct replacement for cuml is slightly faster than adjusting to use dask-ml:

Run	Sklearn	Dask-ml
1	1731.960032	1976.639929
2	1713.143504	1890.307189
3	1692.447222	1819.198046
4	1679.160072	1800.853525
5	1663.727669	1791.983971
Avg	1696.0877	1855.796532

Edit:
Here are times running on a dgx-2

Run	Sklearn	Dask-ml
1	605.7754374	712.7177153
2	609.4057972	703.8873169
3	592.3652494	705.2219992
4	589.4770317	704.7177913
5	589.8500378	698.2876835
Avg	597.3747107	704.9665012

Edit 2:
Here are times running on 2 dgx-1s (TCP)

Run	Sklearn	Dask-ml
1	865.8754275	984.3859689
2	833.6778433	968.5142105
3	814.666688	939.6765635
4	823.4441831	925.5529888
5	806.8892348	929.7718291
Avg	828.9106753	949.5803122

VibhuJawa · 2022-05-03T18:14:18Z

@ChrisJar , Thanks for sharing these benchmarks. Do you have thoughts on how this can change if we scale to 10K. Not saying we should prioritize that, just wondering if you have any thoughts on that front ?

VibhuJawa mentioned this issue Apr 25, 2022

[REVIEW] CPU Dask-SQL q05 #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] ML Portion for GPU-BDB Queries #248

[CPU] ML Portion for GPU-BDB Queries #248

VibhuJawa commented Mar 16, 2022 •

edited

Loading

ChrisJar commented May 3, 2022 •

edited

Loading

VibhuJawa commented May 3, 2022

[CPU] ML Portion for GPU-BDB Queries #248

[CPU] ML Portion for GPU-BDB Queries #248

Comments

VibhuJawa commented Mar 16, 2022 • edited Loading

ChrisJar commented May 3, 2022 • edited Loading

VibhuJawa commented May 3, 2022

VibhuJawa commented Mar 16, 2022 •

edited

Loading

ChrisJar commented May 3, 2022 •

edited

Loading