Kernel ridge regression #4492

RAMitchell · 2022-01-18T12:18:57Z

Sklearn reference implementation: https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09b/sklearn/kernel_ridge.py#L16

I've tried to avoid touching the c++/cuda layer so far. Pairwise kernels are implented based on a numba kernel for now. I've also used cupy's lapack wrapper to access cuSolver.

The implementation of pairwise_kernels here can be reused to very easily implement kernel PCA.

Todo:

RAMitchell · 2022-01-24T13:53:13Z

Benchmarks:

At 14800 rows, cuml takes 0.8071575206238777s, sklearn takes 10.78322389847599s, speedup is 13.359503718854398.

import time
import numpy as np
import pandas as pd

from cuml import KernelRidge as cuKernelRidge
from sklearn.kernel_ridge import KernelRidge as sklKernelRidge
from sklearn.metrics import mean_squared_error as mse
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm

sns.set()

rows_all = np.arange(100, 15000, 300)
# rows_all = np.arange(100, 500, 300)
cols_all = [100]

iterations = 5

rs = np.random.RandomState(2)
estimators = {"sklearn": sklKernelRidge(), "cuml": cuKernelRidge()}
df = pd.DataFrame()

use_cache = False

if not use_cache:
    for n_rows in tqdm(rows_all):
        for n_cols in cols_all:
            X = rs.normal(size=(n_rows, n_cols))
            y = rs.normal(size=n_rows)
            for name, alg in estimators.items():
                # warmup
                alg.fit(X[0:10], y[0:10])
                for i in range(iterations):
                    start = time.perf_counter()
                    alg.fit(X, y)
                    pred = alg.predict(X)
                    time_taken = time.perf_counter() - start

                    if "cupy" in str(type(pred)):
                        pred = pred.get()
                    df = df.append(
                        {
                            "Algorithm": name,
                            "n_rows": n_rows,
                            "n_cols": n_cols,
                            "MSE": mse(y, pred),
                            "Time": time_taken,
                            "Iteration": i,
                        },
                        ignore_index=True,
                    )

if use_cache:
    df = pd.read_pickle("kernel_rr.pkl")
else:
    df.to_pickle("kernel_rr.pkl")
int_cols = ["n_rows", "n_cols", "Iteration"]
df[int_cols] = df[int_cols].astype(int)

sns.lineplot(x="n_rows", y="Time", hue="Algorithm", data=df)
plt.yscale("log")
plt.xticks(rotation=45)
plt.title(
    "Kernel ridge regression time (linear kernel, {} features, float64)".format(
        cols_all[-1]
    )
)
plt.savefig("kernel_ridge_time.png")
plt.clf()
sns.barplot(x="n_rows", y="MSE", hue="Algorithm", data=df)
plt.xticks(rotation=45)
plt.title(
    "Kernel ridge regression MSE (linear kernel, {} features, float64)".format(
        cols_all[-1]
    )
)
plt.savefig("kernel_ridge_mse.png")

sklearn_largest_time = df[
    (df["n_rows"] == df["n_rows"].max()) & (df["Algorithm"] == "sklearn")
]["Time"].mean()
cuml_largest_time = df[
    (df["n_rows"] == df["n_rows"].max()) & (df["Algorithm"] == "cuml")
]["Time"].mean()

print(
    "At {} rows, cuml takes {}s, sklearn takes {}s, speedup is {}.".format(
        df["n_rows"].max(), cuml_largest_time, sklearn_largest_time, sklearn_largest_time/cuml_largest_time 
    )
)

…kernel-ridge

RAMitchell · 2022-01-26T14:28:24Z

This should be on the 22.04 board, not 22.02.

cjnolet

I'm happy to see more kernel-based methods in cuml. This is a really nice port from scikit-learn and I'm thinking the new API for building custom kernels might even be useful for pairwise distances in general (maybe w/ an option to turn symmetry on and off).

python/cuml/metrics/pairwise_kernels.py

cjnolet · 2022-01-31T18:17:43Z

python/cuml/metrics/pairwise_kernels.py

+        pairwise_kernels(X, Y, metric='linear')
+
+        @cuda.jit(device=True)
+        def custom_rbf_kernel(x, y, gamma=None):


I very much like the ability to quickly build custom kernels. Have you done any profiling / benchmarking of this against the cuml.metrics.pairwise_distances API? I'm mostly curious to know the gap between the two, and whether there's a perf hit for the different memory access patterns.

It would be interesting to see the difference, but for now it doesn't really matter as the matrix inversion dominates computation time. It could be 5 times slower than the cuda version and we won't see any real difference in end to end time.

The bigger disadvantage of this approach for me has been jit compile time. It's in the range of a few hundred ms, which I think is reasonable.

No worries. My question isn't about this algorithm in particular. It's been on our todo list for quite awhile to see how performant it would be to allow users to implement custom pairwise distance measures in Numba.

cjnolet · 2022-01-31T18:21:45Z

python/cuml/test/test_kernel_ridge.py

+    return (X, Y)
+
+
+@given(kernel_arg_strategy(), array_strategy())


Love the use of hypothesis here. I'm hoping we will start using it more in cuml.

python/cuml/kernel_ridge/kernel_ridge.pyx

RAMitchell · 2022-02-02T16:45:29Z

Have addressed review comments. I changed the kernel implementations to build off existing primitives more, this had a couple of side effects. The jit compilation overhead went away for most of the kernels, taking the overall test time from 30s down to 10s. The estimator also became much less accurate for float32 inputs, because before I was able to force intermediate calculations to double precision. Accordingly, the tolerance has been significantly reduced for float32 tests.

The cosine kernel still uses the custom kernel path, as implementing this the sklearn way is just very inaccurate and caused me to fail some tests. Chi^2 kernels also still use the custom kernel path as I can't immediately see how to use existing primatives to get this.

I might benchmark the custom versions against the newer versions later if I get time, but this is more a matter of curiosity.

RAMitchell · 2022-02-03T16:10:09Z

Benchmarks comparing custom kernel performance against Implementation using primitives. The custom kernels implementation falls off considerably at higher dimension due to poor memory access patterns. It is still faster than sklearn.

import cupy as cp
import numpy as np
from numba import cuda
from cuml.metrics import pairwise_kernels
from sklearn.metrics.pairwise import pairwise_kernels as skl_pairwise_kernels
import math
import time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
sns.set()


df = pd.DataFrame()
for col in tqdm(range(10, 110, 10)):
    rs = np.random.RandomState(259)
    X = rs.normal(size=(20000, col))
    X_device = cp.array(X)

    K = pairwise_kernels(X_device[0:10], metric='rbf')
    start = time.perf_counter()
    K = pairwise_kernels(X_device, metric='rbf')
    cp.cuda.runtime.deviceSynchronize()
    standard_time = time.perf_counter()-start
    df = df.append(
        {"Algorithm": 'rbf', "n_rows": X.shape[0], "n_cols": X.shape[1], "Time": standard_time}, ignore_index=True)

    @cuda.jit(device=True)
    def custom_rbf_kernel(x, y, gamma=None):
        if gamma is None:
            gamma = 1.0 / len(x)
        sum = 0.0
        for i in range(len(x)):
            sum += (x[i] - y[i]) ** 2
        return math.exp(-gamma * sum)

    start = time.perf_counter()
    K = skl_pairwise_kernels(X, metric='rbf')
    cp.cuda.runtime.deviceSynchronize()
    skl_time = time.perf_counter()-start
    df = df.append(
        {"Algorithm": 'rbf_skl', "n_rows": X.shape[0], "n_cols": X.shape[1], "Time": skl_time}, ignore_index=True)
    # warmup
    K = pairwise_kernels(X_device[0:10], metric=custom_rbf_kernel)
    start = time.perf_counter()
    K = pairwise_kernels(X_device, metric=custom_rbf_kernel)
    cp.cuda.runtime.deviceSynchronize()
    custom_time = time.perf_counter()-start
    df = df.append({"Algorithm": 'rbf_custom',
                   "n_rows": X.shape[0], "n_cols": X.shape[1], "Time": custom_time}, ignore_index=True)

print(df)
sns.lineplot(x='n_cols', y='Time', hue='Algorithm', data=df)
plt.yscale('log')
plt.title('Pairwise kernel time 20,000 rows, varying cols')
plt.savefig("custom_kernels.png")

cjnolet · 2022-02-03T17:27:18Z

python/cuml/kernel_ridge/kernel_ridge.pyx

@@ -0,0 +1,291 @@
+#
+# Copyright (c) 2019-2022, NVIDIA CORPORATION.


Just noticed this- we should remove 2019 since this is a new file.

cjnolet · 2022-02-03T17:41:13Z

python/cuml/metrics/pairwise_kernels.py

+        z += x[i]*y[i]
+        x_norm += x[i] * x[i]
+        y_norm += y[i] * y[i]
+    return z / math.sqrt(x_norm * y_norm)


I believe this is how the pairwise_distances are computing the cosine as well (with exception that it's the 2 - [a.dot(b) / (sqrt(x_l2_norm) * sqrt(y_l2_norm)] (and the sqrt(a)sqrt(b) = sqrt(ab)). It looks like you are doing this as well. Are you saying there's a numerical issue that might be causing incorrect values?

The sklearn version here (https://github.com/scikit-learn/scikit-learn/blob/9f85c9d44965b764f40169ef2917e5f7a798684f/sklearn/metrics/pairwise.py#L1265), when ported using cupy and using cumls normalize function, seemed to be numerically unstable to me. This is why I kept the custom kernel version. I can look more into it if necessary.

I'm just wondering if correcting the cosine distance from cuml.metric.pairwise_distances back to a similarity might help eliminate the jit overhead from this one as well. If not, we can always look further into it in the future. Thanks for changing the other ones!

Ah, I see. I have changed that one to use cosine distance too.

python/cuml/test/test_kernel_ridge.py

cjnolet

Changes LGTM

cjnolet · 2022-02-08T01:40:48Z

rerun tests

cjnolet · 2022-02-08T12:45:50Z

rerun tests

codecov-commenter · 2022-02-08T21:49:38Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.04@5b676a1). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.04    #4492   +/-   ##
===============================================
  Coverage                ?   85.74%           
===============================================
  Files                   ?      239           
  Lines                   ?    19588           
  Branches                ?        0           
===============================================
  Hits                    ?    16796           
  Misses                  ?     2792           
  Partials                ?        0

Flag	Coverage Δ
dask	`46.20% <0.00%> (?)`
non-dask	`78.74% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b676a1...c9955f0. Read the comment docs.

cjnolet · 2022-02-09T14:15:15Z

@gpucibot merge

Sklearn reference implementation: https://github.com/scikit-learn/scikit-learn/blob/7e1e6d09b/sklearn/kernel_ridge.py#L16 I've tried to avoid touching the c++/cuda layer so far. Pairwise kernels are implented based on a numba kernel for now. I've also used cupy's lapack wrapper to access cuSolver. The implementation of `pairwise_kernels` here can be reused to very easily implement kernel PCA. Todo: - [x] Single target fit/predict - [x] Standard kernels implemented - [x] Support custom kernels - [x] Support sample weights - [ ] ~~Support CSR X matrix. Maybe too difficult for this PR.~~ - [x] Multi-target fit/predict - [x] Change .py files to .pyx and moved to correct places. - [x] Benchmarking on reasonably large files - [x] Tests take less than 20s - [x] Ensure correct handling of input/output array types (I think I need to be using CumlArray and maybe some decorators) - [x] Documentation Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Micka (https://github.com/lowener) URL: rapidsai#4492

github-actions bot added the Cython / Python Cython or Python issue label Jan 18, 2022

RAMitchell added 6 commits January 23, 2022 09:53

Basic working implementation

11114f6

Parameter dispatch

47ec1cf

Implement prediction.

9067a37

Cache jit compile

1f8ecac

Numerical stability

a7ec29c

Sample weights

272d2f8

RAMitchell mentioned this pull request Jan 23, 2022

Support categorical splits in in TreeExplainer #4473

Merged

2 tasks

RAMitchell changed the base branch from branch-22.02 to branch-22.04 January 24, 2022 13:58

RAMitchell added 5 commits January 26, 2022 02:32

Remove deadline

9ab20c8

Cythonise, move files

675fce6

Docs

7057929

Lint

e7a75db

Merge branch 'branch-22.04' of https://github.com/rapidsai/cuml into …

c3bd2c4

…kernel-ridge

RAMitchell force-pushed the kernel-ridge branch from baba26f to c3bd2c4 Compare January 26, 2022 14:22

RAMitchell marked this pull request as ready for review January 26, 2022 14:27

RAMitchell requested a review from a team as a code owner January 26, 2022 14:27

RAMitchell changed the title ~~[WIP] Kernel ridge regression~~ Kernel ridge regression Jan 26, 2022

RAMitchell added 2 commits January 26, 2022 07:48

Lint

7f970bf

Copyright

2c3f249

cjnolet self-requested a review January 28, 2022 18:36

cjnolet requested changes Jan 31, 2022

View reviewed changes

lowener reviewed Feb 1, 2022

View reviewed changes

python/cuml/kernel_ridge/kernel_ridge.pyx Outdated Show resolved Hide resolved

Fix predict

483e164

RAMitchell added 2 commits February 2, 2022 04:01

Pass base tests

f245462

Change kernel implementations

a6dc27d

Don't test float32 gradient - too unstable.

bfab33b

cjnolet reviewed Feb 3, 2022

View reviewed changes

cjnolet added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 3, 2022

cjnolet reviewed Feb 3, 2022

View reviewed changes

python/cuml/test/test_kernel_ridge.py Outdated Show resolved Hide resolved

RAMitchell added 2 commits February 4, 2022 03:23

Copyright

b9b1346

Review comments

c9955f0

cjnolet approved these changes Feb 4, 2022

View reviewed changes

lowener approved these changes Feb 8, 2022

View reviewed changes

rapids-bot bot merged commit 9921c61 into rapidsai:branch-22.04 Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel ridge regression #4492

Kernel ridge regression #4492

RAMitchell commented Jan 18, 2022 •

edited

Loading

RAMitchell commented Jan 24, 2022

RAMitchell commented Jan 26, 2022

cjnolet left a comment

cjnolet Jan 31, 2022

RAMitchell Feb 1, 2022

cjnolet Feb 1, 2022

cjnolet Jan 31, 2022

RAMitchell commented Feb 2, 2022

RAMitchell commented Feb 3, 2022

cjnolet Feb 3, 2022

cjnolet Feb 3, 2022 •

edited

Loading

RAMitchell Feb 3, 2022 •

edited

Loading

cjnolet Feb 3, 2022

RAMitchell Feb 4, 2022

cjnolet left a comment

cjnolet commented Feb 8, 2022

cjnolet commented Feb 8, 2022

codecov-commenter commented Feb 8, 2022

cjnolet commented Feb 9, 2022

		return (X, Y)


		@given(kernel_arg_strategy(), array_strategy())

		@@ -0,0 +1,291 @@
		#
		# Copyright (c) 2019-2022, NVIDIA CORPORATION.

Kernel ridge regression #4492

Kernel ridge regression #4492

Conversation

RAMitchell commented Jan 18, 2022 • edited Loading

RAMitchell commented Jan 24, 2022

RAMitchell commented Jan 26, 2022

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet Jan 31, 2022

Choose a reason for hiding this comment

RAMitchell Feb 1, 2022

Choose a reason for hiding this comment

cjnolet Feb 1, 2022

Choose a reason for hiding this comment

cjnolet Jan 31, 2022

Choose a reason for hiding this comment

RAMitchell commented Feb 2, 2022

RAMitchell commented Feb 3, 2022

cjnolet Feb 3, 2022

Choose a reason for hiding this comment

cjnolet Feb 3, 2022 • edited Loading

Choose a reason for hiding this comment

RAMitchell Feb 3, 2022 • edited Loading

Choose a reason for hiding this comment

cjnolet Feb 3, 2022

Choose a reason for hiding this comment

RAMitchell Feb 4, 2022

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Feb 8, 2022

cjnolet commented Feb 8, 2022

codecov-commenter commented Feb 8, 2022

Codecov Report

cjnolet commented Feb 9, 2022

RAMitchell commented Jan 18, 2022 •

edited

Loading

cjnolet Feb 3, 2022 •

edited

Loading

RAMitchell Feb 3, 2022 •

edited

Loading