Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Kernel and Op Implementation and Registration API #133

Merged
merged 6 commits into from
Jun 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
295 changes: 295 additions & 0 deletions rfcs/20190814-kernel-and-op-registration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Kernel and Op Implementation and Registration API

| Status | Accepted |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | James Ring ([email protected]). |
| **Sponsor** | Günhan Gülsoy ([email protected]) |
| **Updated** | 2020-06-02 |

## Objective

Tensorflow (TF) currently provides a C++ API for implementing kernels and ops.
The Voltron project aims to create a modular/plugin-based TF implementation with
API and ABI surfaces. Plugins will be able to create and register custom kernel
and op implementations.

In order to provide a stable ABI, the Voltron team has chosen to provide C APIs
to plugin authors. This document introduces the C API for op and kernel
registration. For authors who wish to continue using C++ to interface with
TensorFlow, an ABI-stable C++ header-only API is provided.

## Motivation

Presently, there is no ABI-stable API for extending TensorFlow with new kernels
and ops. There is no guarantee that a plugin written with one compiler will work
with a version of TensorFlow built with another, even on the same operating
system and architecture. This makes it difficult to distribute plugins without
also distributing the source code and requiring end-users to build the plugin
alongside TensorFlow.

An ABI-stable API for extending TensorFlow will simplify the distribution of
plugins and allow plugin authors to distribute binary artifacts without
necessarily publishing plugin source code.

## User Benefit

Plugin authors will be able to publish plugins that users can use more easily.
In turn, the TensorFlow community will benefit from an increase in the number of
variety of available plugins.

## Design Overview

In general, the kernel and op registration C APIs aim to permit the
implementation of any kernel or op that is currently possible with the C++ API.
Where possible, existing C++ function implementations are reused from within a C
wrapper. The purpose of the wrapper is simply to provide ABI stability.

Since plugins will be dynamically loaded (e.g. via `dlopen` on POSIX), the API
avoids relying on static initialization.

The intention is that existing kernels should be able to be ported to the new
APIs with a minimum of reimplementation effort. This precludes a from-scratch
re-imagining of TensorFlow APIs.

The following diagram describes the components built with the proposed C and C++
APIs.

+----------------+ <--+
| | |
| Plugin | |
| | |
+----------------+ |
| | |
| C++ header API | | Plugin
| | | my_plugin.so
+--> +----------------+ |
| | | |
| | C API headers | |
| | | |
| +----------------+ <--+
| | |
| | C API impl |
Core | | |
Tensorflow | +----------------+
libtf.so | | |
| | Core C++ APIs |
| | |
+--> +----------------+

In this example, there are two object files: `my_plugin.so` and
`libtensorflow.so`. `my_plugin.so` is implemented in terms of the C++
header-only API, which is in turn implemented in terms of the C API headers. The
C API implementation is provided by TensorFlow at runtime when it loads the
plugin's shared object.

This design addresses changes that are required to the existing C API that are
required to support op and kernel plugins. It also introduces the C++
header-only API, which currently does not exist.

## Ops

This section introduces changes to the C API that are required to support ops.
An alpha version of this API is already checked in at `tensorflow/c/ops.h`.

### Registration

In the C++ API, ops are registered at static initialization time using the
`REGISTER_OP` macro. For example:

```c++
REGISTER_OP("Bitcast")
.Input("input: T")
.Output("output: type")
.Attr("T: {bfloat16, ...}")
.Attr("type: {bfloat16, ...}")
.SetShapeFn([](InferenceContext* ctx) { ... })
.Doc("A bitcast operator");
```

The equivalent C API will be a series of functions that operate on
`TF_OpDefinitionBuilder *`, a pointer to an opaque struct (i.e. a struct whose
content is not made known to the user). The functions include, but are not
limited to:

* `TF_OpDefinitionBuilder* TF_NewOpDefinitionBuilder(const char* op_name)`:
constructs and returns a new op registration builder for an op with the given
name

* `void TF_OpDefinitionBuilderAddAttr(TF_OpDefinitionBuilder* builder, const
char* attr)`: adds the given attribute to the builder (equivalent to `Attr`
above)

* `void TF_OpDefinitionBuilderAddInput(TF_OpDefinitionBuilder* builder, const
char* input)`: adds the given input to the builder (equivalent to `Input`
above)

Additional functions are provided for setting other properties of the operation
(e.g. `TF_OpDefinitionBuilderSetIsCommutative`).

Registration is then actually performed using the `TF_RegisterOpDefinition`
function. This function populates a `TF_Status` indicating whether registration
was successful and frees the resources associated with the op definition
builder.

The C equivalent of the bitcast op registration example above is shown below:

```c++

#include "tensorflow/c/ops.h"

void InferBitcastShape(TF_ShapeInferenceContext* ctx, // see the section below on
TF_Status* status); // shape inference

void InitPlugin() {
TF_OpDefinitionBuilder* b = TF_NewOpDefinitionBuilder("Bitcast");
TF_OpDefinitionBuilderAddInput(b, "input: T");
TF_OpDefinitionBuilderAddOutput(b, "output: type");
TF_OpDefinitionBuilderAddAttr(b, "T: {bfloat16, ...}");
TF_OpDefinitionBuilderAddAttr(b, "type: {bfloat16, ...}");
TF_OpDefinitionBuilderSetShapeInferenceFunction(b, &InferBitcastShape);

TF_Status* status = TF_NewStatus();
TF_RegisterOpDefinition(b, status);
if (TF_GetCode(status) != TF_OK) { /* handle errors */ }
}

```

### Shape Inference

A significant feature of certain ops is their ability to infer their output
shapes. TensorFlow will invoke the registered shape inference function (if one
is provided) when it needs to know the op's output shape. The registration
function declaration is shown below:


```c++
void TF_OpDefinitionBuilderSetShapeInferenceFunction(
TF_OpDefinitionBuilder* builder,
void (*shape_inference_func)(TF_ShapeInferenceContext* ctx, TF_Status* status));
```

A series of functions prefixed with `TF_ShapeInferenceContext` is provided for
the following purposes:

* Examining operator input shapes (`TF_ShapeInferenceContextGetInput`)

* Creating and deleting shape and dimension handles (`TF_{New,Delete}ShapeHandle`, `TF_{New,Delete}DimensionHandle`)

* Manipulating shape and dimension handles (`TF_ShapeInferenceContextWithRank`, `TF_ShapeInferenceContextDim`)

In general, C analogues to the C++ methods in `tensorflow::shape_inference`
(see `tensorflow/core/framework/shape_inference.h`) will be provided.

## Kernels

This section introduces changes to the C API that are required to support
kernels. An alpha version of this API is already checked in at
`tensorflow/c/kernels.h`.

### Registration

Kernel registration with the C++ API is accomplished with the
`REGISTER_KERNEL_BUILDER` macro. This macro expands to code that relies on
static initialization to register the provided kernel with the global kernel
registry. See below for an example of registering a kernel with the C++ API:

```c++

#include "tensorflow/core/framework/op_kernel.h"

class BitcastOp : public OpKernel {
explicit BitcastOp(OpKernelConstruction* context) : OpKernel(context) { … }
void Compute(OpKernelContext* context) override { … }
};

REGISTER_KERNEL_BUILDER(Name("Bitcast").Device(DEVICE_CPU), BitcastOp)
```

The equivalent C API provides a series of functions that operate on
`TF_KernelBuilder`, an opaque struct obtained with the `TF_NewKernelBuilder` call.
The kernel builder is registered with TensorFlow using the
`TF_RegisterKernelBuilder` function. See below for an example of registering
the bitcast kernel using the C API:

```c++
#include "tensorflow/c/kernels.h"

typedef struct bitcast_kernel { … } bitcast_kernel;

// Bitcast_Create, Bitcast_Compute and Bitcast_Delete actually implement the
// kernel. See the section below for discussion on kernel implementation.
static void* Bitcast_Create(TF_OpKernelConstruction* context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TF today op kernel construction can fail (and if often does, as it validates attributes etc). This should have a status somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpKernelConstruction itself contains a status which can be set with TF_OpKernelConstruction_Failure. I'll update the doc.

bitcast_kernel* k = (bitcast_kernel*) calloc(1, sizeof(bitcast_kernel));
/* initialize the fields of k as needed */
return (void*) k;
}

static void* Bitcast_Compute(void* k, TF_OpKernelContext* context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the return value here? Outputs are not supposed to be allocated and then returned (they are allocated via the context) so I don't know why this is not void.

Copy link
Contributor Author

@sjamesr sjamesr Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, well spotted :)

bitcast_kernel* kernel = (bitcast_kernel*) k; // this is the pointer returned by
// Bitcast_Create
/* compute the result */
TF_SetOutput(context, ...);
}

static void Bitcast_Delete(void *k) { free(k); }

void InitPlugin() {
TF_KernelBuilder* builder = TF_NewKernelBuilder(/*op_name*/"Bitcast", DEVICE_CPU,
&Bitcast_Create, &Bitcast_Compute, &Bitcast_Delete);
TF_Status* status = TF_NewStatus();
TF_RegisterKernelBuilder(/*kernel_name*/"Bitcast", builder, status);
if (TF_GetCode(status) != TF_OK) { /* handle errors */ }
TF_DeleteStatus(status);
}
```

The registration function prototypes are provided below. Kernel authors must
provide a compute function. Creation and deletion functions are optional, but
if a creation function is provided that causes memory allocation, a deletion
function that frees the memory should also be provided, otherwise a leak will
occur.

```c++
TF_KernelBuilder* TF_NewKernelBuilder(
const char* op_name, const char* device_name,
void* (*create_func)(TF_OpKernelConstruction*),
void (*compute_func)(void*, TF_OpKernelContext*),
void (*delete_func)(void*));

void TF_RegisterKernelBuilder(const char* name, TF_KernelBuilder* builder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two functions instead of one function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel builder may be customized with other calls, such as TF_KernelBuilder_TypeConstraint and TF_KernelBuilder_HostMemory, I'll add these to the doc.

TF_Status* status);
```

### Implementation

The main classes for C++ kernel implementations are `OpKernelCreation`
(provided by TensorFlow to the kernel constructor) and `OpKernelContext`
(provided to the kernel's `Compute` method). The analogues in the C API are
`TF_OpKernelCreation` and `TF_OpKernelContext`. The aim of the C API is to
provide functions for working with these structs that match, as closely as
possible, the C++ API.

### Inputs and Outputs

Kernels must be able to retrieve their inputs and provide outputs. In the C++
API, the tensorflow::OpKernelContext::GetInput and SetOutput family of
functions provide this functionality. The equivalent C calls will be
`TF_GetInput` and `TF_SetInput`. These functions operate on `TF_Tensor`, which
is already part of the existing TensorFlow C API.

String tensors will be supported in an ABI-stable way. This will require
changes to their binary representation described in the [tstring design
document](https://github.com/tensorflow/community/blob/master/rfcs/20190411-string-unification.md).

## C++ Header-Only API

As described above, the main motivation for providing a C API is ABI stability.
However, some programmers may find the C API less convenient than the
non-ABI-stable C++ API. To address this concern, we plan to provide a
header-only C++ API that is implemented in terms of the ABI-stable C API. This
API will contain classes such as `Tensor`, `OpKernelContext`, and
`OpKernelConstruction`, whose names will be familiar to existing C++ API users.
Ideally, this API will be as close as possible to the existing non-ABI-stable
Tensorflow C++ API, so that kernels and ops currently implemented in C++ may be
ported to the ABI-stable C++ with as little implementation churn as possible.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.