Scikeras GPU training with Pipelines #250

ansar-sa · 2021-09-16T08:43:33Z

Is it possible for scikeras to train models that are part of sklearn Pipelines on a GPU? If so, any sample code for this please?

adriangb · 2021-09-16T20:14:13Z

It's possible! Here's an example that is a mashup of this notebook and this tutorial:

from time import time

import numpy as np
import tensorflow as tf
from scikeras.wrappers import KerasClassifier
from tensorflow import keras
from tensorflow.keras import layers


device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')


# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


def build_model() -> keras.Model:
    model = keras.Sequential(
        [
            keras.Input(shape=input_shape),
            layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Flatten(),
            layers.Dropout(0.5),
            layers.Dense(num_classes, activation="softmax"),
        ]
    )
    model.compile(loss="categorical_crossentropy", optimizer="adam")
    return model


def test():
    batch_size = 128
    epochs = 15
    clf = KerasClassifier(
        build_model,
        fit__batch_size=batch_size,
        epochs=epochs,
        verbose=False
    )
    clf.fit(x_train[:batch_size*10], y_train[:batch_size*10])


def cpu():
  with tf.device('/cpu:0'):
    test()

def gpu():
  with tf.device('/device:GPU:0'):
    test()


# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

# Run the op several times.
print('CPU (s):')
start = time()
cpu()
cpu_time = time() - start
print(cpu_time)
print('GPU (s):')
start = time()
gpu()
gpu_time = time() - start
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

I ran this in Colab and got a 9x speedup.

Do note that for small models or small batch sizes you can actually get worse performance on GPU than CPU (even with plain Keras), so do make sure you will actually benefit from using a GPU.

adriangb · 2021-09-17T01:16:58Z

Sorry, I just saw the pipeline part of the question. Your model itself, or any transformers you build on top of Keras, can run on GPU. But SciKeras can't somehow load the entire pipeline into the GPU. Further, unless it's just your final model on the GPU, you may find that performance is severely impacted by bandwidth of pushing and pulling the data in between pipeline steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scikeras GPU training with Pipelines #250

Scikeras GPU training with Pipelines #250

ansar-sa commented Sep 16, 2021

adriangb commented Sep 16, 2021 •

edited

Loading

adriangb commented Sep 17, 2021

Scikeras GPU training with Pipelines #250

Scikeras GPU training with Pipelines #250

Comments

ansar-sa commented Sep 16, 2021

adriangb commented Sep 16, 2021 • edited Loading

adriangb commented Sep 17, 2021

adriangb commented Sep 16, 2021 •

edited

Loading