Skip to content

bayer-science-for-a-better-life/cerebras-modelzoo-bayer

 
 

Repository files navigation

Cerebras Model Zoo

Introduction

This repository contains examples of common deep learning models that can be trained on Cerebras hardware. These models demonstrate the best practices for coding a model targeted at the Cerebras hardware so that you can take full advantage of this new powerful compute engine.

In order to get started with running your models on a CS system, please refer to the Developer Documentation along with this readme.

NOTE: If you are interested in trying out Cerebras Model Zoo on Cerebras Hardware (CS-2 Systems), we offer the following options:

  • Academics - Please fill out our Partner Hardware Access Request form here and we will contact you about gaining access to a system from one of our partners.
  • Commercial - Please fill out our Get Demo form here so that our team can provide you with a demo and discuss access to our system.
  • For all others - Please contact us at [email protected].

For a list of all supported models, please check models in this repository.

Supported frameworks

We support the models developed in PyTorch and TensorFlow. To get more info on framework specific workflow, please refer to the developer docs listed below:

Basic workflow

When you are targeting the Cerebras CS system for your neural network jobs, please follow the quick start links from the developer docs listed below to compile, validate and train the models in this ModelZoo for the framework of your choice.

For advanced use cases and porting your existing code from TF or PyTorch to be Cerebras compatible, the high-level workflow can be described as:

  1. Port your code to CS in one of the supported frameworks.
    • For PyTorch use cerebras.framework.torch.
    • For TensorFlow use CerebrasEstimator.
  2. Prepare input data ensuring that you pre-process the input data by sharding, shuffling, prefetching, interleaving, repeating, batching, etc., in a proper order.
  3. Compile your code on CPU to optimize your code for your specific CS system early on.
  4. Run your compiled code on the CS system.

Execution modes

On the Cerebras Wafer Scale Engine (WSE) you can run neural networks of different model sizes. Cerebras Software supports different execution modes to efficiently run such variety of models.

The execution mode refers to how the Cerebras runtime loads your neural network model onto the Cerebras Wafer Scale Engine (WSE). Two execution modes are supported:

  • Layer pipelined: In this mode all the layers of the network are loaded altogether onto the Cerebras WSE. This mode is selected for neural network models of small to medium sized models (with less than a billion parameters).

  • Weight streaming: In this mode one layer of the neural network model is loaded at a time. This layer-by-layer mode is used to run extremely large models (with billions to trillions of parameters).

You can get more information about this on the developer page section on Cerebras Execution Modes

Optimizations for Cerebras hardware

We provide various features to speed up the training by leveraging properties of the Cerebras hardware. Following are the key features we provide:

For general optimization techniques, please refer to the Performance Best Practices page.

PyTorch Variable Tensor Shape (VTS)

Variable Tensor Shape (VTS) is a feature that allows computations on the CS system running in pipeline mode to process tensors which vary in shape from one element of a batch to the next. This helps in accommodating input data with heterogeneous sequence length, allowing users to strip away large padding samples on smaller sequences. This leads to less wasted computations and improves the training time of the models.

To learn more about VTS, visit the developer doc page on the same topic here.

TensorFlow Variable Sequence Length (VSL)

Conceptually same as VTS, VSL is a legacy name that we support on the TensorFlow models. VSL is limited in its generality and is currently in the process of being replaced by VTS.

To learn more about VSL, visit the developer doc page on the same topic here.

Multi-replica mode

Multi-replica Data Parallel Training is a feature that the Cerebras compiler uses to create several copies (replicas) of the same model to run data parallel training. This is similar to how multiple GPUs are used to accelerate training of a single model.

In the background, the compiler ensures that these replicas are initialized with the same weights, and during the training, the weights across all replicas are synchronized after every batch.

A single trained model is available at the conclusion of multi-replica data parallel training. This multi-replica data parallel feature can be used only for training the model.

To learn more about multi-replica, please visit the developer doc page on the same topic here.

Models in this repository

Model Layer Pipeline mode Weight Streaming mode
BERT TensorFlow code
PyTorch code
-
BERT (fine-tuning) Classifier TensorFlow code
PyTorch code
-
BERT (fine-tuning) Named Entity Recognition TensorFlow code
PyTorch code
-
BERT (fine-tuning) Summarization TensorFlow code
PyTorch code
-
BERT (fine-tuning) Question Answering TensorFlow code
PyTorch code
-
GPT-2 TensorFlow code
PyTorch code
TensorFlow code
PyTorch code
GPT-3 - TensorFlow code
GPT-J - TensorFlow code
GPT-J (fine-tuning) Summarization - TensorFlow code
Linformer TensorFlow code -
RoBERTa TensorFlow code
PyTorch code
-
T5 TensorFlow code
PyTorch code
-
Transformer TensorFlow code
PyTorch code
-
MNIST (fully connected) TensorFlow code
PyTorch code
-
2D UNet (experimental) TensorFlow code -

License

Apache License 2.0

About

Fork of cerebras' model zoo to include Bayer-specific changes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.3%
  • Shell 1.7%