Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scikeras GPU training with Pipelines #250

Open
ansar-sa opened this issue Sep 16, 2021 · 2 comments
Open

Scikeras GPU training with Pipelines #250

ansar-sa opened this issue Sep 16, 2021 · 2 comments

Comments

@ansar-sa
Copy link

Is it possible for scikeras to train models that are part of sklearn Pipelines on a GPU? If so, any sample code for this please?

@adriangb
Copy link
Owner

adriangb commented Sep 16, 2021

It's possible! Here's an example that is a mashup of this notebook and this tutorial:

from time import time

import numpy as np
import tensorflow as tf
from scikeras.wrappers import KerasClassifier
from tensorflow import keras
from tensorflow.keras import layers


device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')


# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


def build_model() -> keras.Model:
    model = keras.Sequential(
        [
            keras.Input(shape=input_shape),
            layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Flatten(),
            layers.Dropout(0.5),
            layers.Dense(num_classes, activation="softmax"),
        ]
    )
    model.compile(loss="categorical_crossentropy", optimizer="adam")
    return model


def test():
    batch_size = 128
    epochs = 15
    clf = KerasClassifier(
        build_model,
        fit__batch_size=batch_size,
        epochs=epochs,
        verbose=False
    )
    clf.fit(x_train[:batch_size*10], y_train[:batch_size*10])


def cpu():
  with tf.device('/cpu:0'):
    test()

def gpu():
  with tf.device('/device:GPU:0'):
    test()


# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
cpu()
gpu()

# Run the op several times.
print('CPU (s):')
start = time()
cpu()
cpu_time = time() - start
print(cpu_time)
print('GPU (s):')
start = time()
gpu()
gpu_time = time() - start
print(gpu_time)
print('GPU speedup over CPU: {}x'.format(int(cpu_time/gpu_time)))

I ran this in Colab and got a 9x speedup.

Do note that for small models or small batch sizes you can actually get worse performance on GPU than CPU (even with plain Keras), so do make sure you will actually benefit from using a GPU.

@adriangb
Copy link
Owner

Sorry, I just saw the pipeline part of the question. Your model itself, or any transformers you build on top of Keras, can run on GPU. But SciKeras can't somehow load the entire pipeline into the GPU. Further, unless it's just your final model on the GPU, you may find that performance is severely impacted by bandwidth of pushing and pulling the data in between pipeline steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants