diff --git a/rfcs/20200612-stream-executor-c-api.md b/rfcs/20200612-stream-executor-c-api.md new file mode 100644 index 000000000..97bec78b7 --- /dev/null +++ b/rfcs/20200612-stream-executor-c-api.md @@ -0,0 +1,906 @@ +# StreamExecutor C API + +| Status | Proposed | +| :------------ | :------------------------------------------------------ | +| **RFC #** | [257](https://github.com/tensorflow/community/pull/257) | +| **Authors** | Anna Revinskaya (annarev@google.com), Penporn Koanantakool (penporn@google.com), Yi Situ (yisitu@google.com), Russell Power (power@google.com) | +| **Sponsor** | Gunhan Gulsoy (gunan@google.com) | +| **Updated** | 2020-09-08 | + +# Objective + +Provide basic device management C API to allow new devices to modularly connect +to the current TensorFlow runtime. + +## Goals + +* C API wrapper of a subset of methods in StreamExecutorInterface. +* Best-effort API and ABI stability after an initial experimental phase. + +## Non-goals + +* Compatibility with the + [new TensorFlow runtime stack](https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html). +* APIs that will expose all device-specific capabilities. + +# Motivation + +Current device support in TensorFlow adds code directly into the +[main TensorFlow repository](http://github.com/tensorflow/tensorflow). This +approach is +[not scalable](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md#adding-support-for-new-hardware-is-very-difficult-and-not-scalable) +because it adds complexity to the build dependency and tool chains, takes longer +time to build, and requires the TensorFlow team’s review. To handle the surge in +new hardware accelerators and programming paradigms, TensorFlow must allow +device addition in a modular manner: contributors code outside of the TensorFlow +repository and distribute a binary module which would connect to TensorFlow at +runtime through a stable application binary interface (ABI). + +The new TensorFlow stack, based on +[TFRT](https://blog.tensorflow.org/2020/04/tfrt-new-tensorflow-runtime.html) and +[MLIR](https://www.tensorflow.org/mlir), is designed with this in mind. However, +it is still in an active development phase and is not ready for third-party +device integration yet. (For device support expecting to land +in 2021 or later, we highly recommend waiting to integrate with the new stack, +since it is fundamentally different from the current stack and cannot guarantee +code reuse.) + +In the meantime, we plan to provide limited device integration support for the +current TensorFlow stack through +[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md). +We anticipate three basic functionalities within a device plug-in module: + +* Device registration: Addressed in a different RFC, [Adding Pluggable Device for TensorFlow](https://github.com/tensorflow/community/pull/262). +* Device management: The focus of this RFC. +* Kernel and op registration and implementation: + [RFC Accepted](https://github.com/tensorflow/community/blob/master/rfcs/20190814-kernel-and-op-registration.md). [C API implemented](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/c/). + +[StreamExecutor](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_pimpl.h;l=73) is TensorFlow's main device manager, responsible for work execution and memory +management. It provides a set of methods (such as +[Memcpy](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=240)) +that can be customized for a particular device. + +We propose a C API wrapper of a subset of methods in +[StreamExecutorInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=166?q=StreamExecutorinterface) +as an ABI-stable way to register a custom StreamExecutor platform. + +# User Benefits + +A decoupled way to add a new device to TensorFlow. + +* Simpler process: Does not have to add a new build toolchain to TensorFlow +* Faster time-to-solution: Does not need code review from the TensorFlow team. +* Lower maintenance efforts: Only C-API-related changes could break the + integration. Unrelated TensorFlow changes would not break the code. + * The C APIs may be changed during the initial experimental phase based + on developer experience and feedback. When the APIs become more mature, + we will try to keep them stable (in a best-effort manner) until the new + TensorFlow stack is available. + +# Design Proposal + +[StreamExecutorInterface](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/stream_executor_internal.h;l=166?q=StreamExecutorinterface) +has a large number of methods, some of which are only sporadically used. +Therefore, we plan to wrap only a subset of key `StreamExecutorInterface` +functionality. We decided on this subset based on the [PluggableDevice](https://github.com/tensorflow/community/pull/262) +usecase as well as potential future devices such as TPUs. + +## Versioning Strategy and Stability +StreamExecutor C API follows Semantic Versioning 2.0.0 ([semver](http://semver.org/)). +Each release version has a format `MAJOR.MINOR.PATCH`, as outlined in +[TensorFlow version compatibility](https://www.tensorflow.org/guide/versions#semantic_versioning_20). +We also use struct sizes to track compatibility. More details on functionality +extension and deprecation can be found in [StreamExecutor C API Versioning Strategy](20200612-stream-executor-c-api/C_API_versioning_strategy.md). + +The C API will have an initial bake-in period, where we won’t have any +compatibility guarantees. However, we will make the best effort to perform any +updates in a backwards compatible way. For example, we plan to keep track of +struct sizes. During this period, the API will be kept at `MAJOR` version 0. + +The C API will be placed in [tensorflow/c/experimental](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/c/experimental/). +We will consider moving the API out of the experimental directory once it is +more stable. + +## Implementation Conventions + +* Struct prefix indicates whether struct fields should be filled by the plug-in or core TensorFlow implementation: + * `SE_`: Set/filled by core, unless marked otherwise. + * `SP_`: Set/filled by plug-in, unless marked otherwise. + * This prefix rule only applies to structures. Enumerations and methods are all prefixed with `SE_`. +* Structs begin with two fields: + * `size_t struct_size`: Stores the unpadded size of the struct. + * `void* ext`: A reserved field that may be populated by a plugin in `SP_*` structs or potential future extension points in `SE_` structs. Must be set to zero by default if it unused. +* We use `struct_size` for version checking by both core and plug-in. + * It is exempt from the `SE/SP` rule above and must be set both by core and plug-in. + * It can be checked programmatically to determine which struct fields are available in the structure. + * For example, `create_device` function receives `SP_Device*` as input with `struct_size` populated by core. The plug-in is responsible for setting `struct_size` as well, along with all other fields. +* When a member is added to a struct, the struct size definition must be updated to use the new last member of the struct. + +## Usage Overview + +The table below summarizes all structures defined and the functionality they involve. +| Action | Function call(s) | Populated by Core TensorFlow | Populated by plug-in | +| :----- | :-------------- | :--------------------------- | :------------------- | +| Register platform | `SE_InitPlugin` | `SE_PlatformRegistrationParams` | `SP_Platform`, `SP_PlatformFns` | +| Create device | `SP_PlatformFns::create_device` | `SE_CreateDeviceParams` | `SP_Device` | +| Create stream executor | `SP_PlatformFns::create_stream_executor` | `SE_CreateStreamExecutorParams` | `SP_StreamExecutor` | +| Create timer functions | `SP_PlatformFns::create_timer_fns` | None | `SP_TimerFns` | +| Get allocator stats | `SP_StreamExecutor::get_allocator_stats` | None | `SP_AllocatorStats` | +| Memory management | `SP_StreamExecutor::*allocate*`, `SP_StreamExecutor::*memcpy*` | None | `SP_DeviceMemoryBase` | + +### Registration +Core TensorFlow will register a new StreamExecutor platform as well as a new TensorFlow device with [DeviceFactory](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/common_runtime/device_factory.h;l=30?q=DeviceFactory). +1. Core TensorFlow links to plug-in's dynamic library and loads the function `SE_InitPlugin`. +2. Core TensorFlow populates `SE_PlatformRegistrationParams` and passes it in a call to `SE_InitPlugin`. + * In `SE_InitPlugin`, plug-in populates `SE_PlatformRegistrationParams::SP_Platform` and `SE_PlatformRegistrationParams::SP_PlatformFns`. +3. Core TensorFlow can now create a device, a stream executor, and a timer through functions in `SP_PlatformFns`. + * Core TensorFlow populates `SE_CreateDeviceParams` and pass it as a parameter to `SP_PlatformFns::create_device()`. + * Plug-in populates `SE_CreateDeviceParams::SP_Device`. + * Core TensorFlow populates `SE_CreateStreamExecutorParams` and pass it to `SP_PlatformFns::create_stream_executor()`. + * Plug-in populates `SE_CreateStreamExecutorParams::SP_StreamExecutor`. + * Core TensorFlow sets `struct_size` in `SP_Timer` and pass it in a call to `SP_PlatformFns::create_timer_fns`. + * Plug-in populates `SP_TimerFns`. +4. Core TensorFlow registers a new `PluggableDeviceFactory`. + +`PluggableDevice` is covered in a separate RFC: [Adding Pluggable Device For TensorFlow](https://github.com/tensorflow/community/pull/262). + + +### Definitions from Plug-in +Plug-in needs to provide: +* Methods: `SE_InitPlugin` and other methods declared in `SP_*` structs. +* Structures: `SP_Stream_st`, `SP_Event_st`, and `SP_Timer_st`. + +## Detailed API +```c++ +#define SE_MAJOR 0 +#define SE_MINOR 0 +#define SE_PATCH 1 + +// TF_Bool is the C API typedef for unsigned char, while TF_BOOL is +// the datatype for boolean tensors. +#ifndef TF_Bool +#define TF_Bool unsigned char +#endif // TF_Bool + +// Macro used to calculate struct size for maintaining ABI stability across +// different struct implementations. +#ifndef TF_OFFSET_OF_END +#define TF_OFFSET_OF_END(TYPE, MEMBER) \ + (offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER)) +#endif // TF_OFFSET_OF_END + +#ifdef __cplusplus +extern "C" { +#endif + +typedef struct SP_Stream_st* SP_Stream; +typedef struct SP_Event_st* SP_Event; +typedef struct SP_Timer_st* SP_Timer; +// Takes `callback_arg` passed to `host_callback` as the first argument. +typedef void (*SE_StatusCallbackFn)(void* const, TF_Status* const); + +typedef struct SP_TimerFns { + size_t struct_size; + void* ext; // reserved for future use + uint64_t (*nanoseconds)(SP_Timer timer); +} SP_TimerFns; + +#define SP_TIMER_FNS_STRUCT_SIZE TF_OFFSET_OF_END(SP_TimerFns, nanoseconds) + +typedef struct SP_AllocatorStats { + size_t struct_size; + int64_t num_allocs; + int64_t bytes_in_use; + int64_t peak_bytes_in_use; + int64_t largest_alloc_size; + + int8_t has_bytes_limit; + int64_t bytes_limit; + + int64_t bytes_reserved; + int64_t peak_bytes_reserved; + + int8_t has_bytes_reservable_limit; + int64_t bytes_reservable_limit; + + int64_t largest_free_block_bytes; +} SP_AllocatorStats; + +#define SP_ALLOCATORSTATS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_AllocatorStats, largest_free_block_bytes) + +// Potential states for an SP_Event. If `poll_for_status` returns anything aside +// from kPending or kComplete, an error has occurred; kUnknown is a bad state. +typedef enum SE_EventStatus { + SE_EVENT_UNKNOWN, + SE_EVENT_ERROR, + SE_EVENT_PENDING, + SE_EVENT_COMPLETE, +} SE_EventStatus; + +// Memory allocation information. +// This matches DeviceMemoryBase defined here: +// https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/device_memory.h;l=57 +typedef struct SP_DeviceMemoryBase { + size_t struct_size; + void* ext; // free-form data set by plugin + // Platform-dependent value representing allocated memory. + void* opaque; + uint64_t size; // Size in bytes of this allocation. + uint64_t payload; // Value for plugin's use +} SP_DeviceMemoryBase; + +#define SP_DEVICE_MEMORY_BASE_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_DeviceMemoryBase, payload) + +typedef struct SP_Device { + size_t struct_size; + void* ext; // free-form data set by plugin + int32_t ordinal; // device index + + // Device vendor can store handle to their device representation + // here. + void* device_handle; +} SP_Device; + +#define SP_DEVICE_STRUCT_SIZE TF_OFFSET_OF_END(SP_Device, device_handle) + +typedef struct SE_CreateDeviceParams { + size_t struct_size; + void* ext; // reserved for future use + int32_t ordinal; // device index + + SP_Device* device; // Input/output, struct_size set by TF for plugin to read. + // Subsequently plugin fills the entire struct. +} SE_CreateDeviceParams; + +#define SE_CREATE_DEVICE_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateDeviceParams, device) + +typedef struct SP_StreamExecutor { + size_t struct_size; + void* ext; // reserved for future use + + /*** ALLOCATION CALLBACKS ***/ + // Synchronously allocates `size` bytes on the underlying platform and returns + // `SP_DeviceMemoryBase` representing that allocation. In the case of failure, + // NULL is returned. + // `memory_space` is reserved for a potential future usage and should be set + // to 0. + void (*allocate)(const SP_Device* device, uint64_t size, int64_t memory_space, + SP_DeviceMemoryBase* mem); + + // Deallocate the device memory previously allocated via this interface. + // Deallocation of a NULL representative value is permitted. + void (*deallocate)(const SP_Device* device, SP_DeviceMemoryBase* memory); + + // Allocates a region of host memory and registers it with the platform API. + // Memory allocated in this manner is required for use in asynchronous memcpy + // operations, such as `memcpy_dtoh`. + void* (*host_memory_allocate)(const SP_Device* device, uint64_t size); + + // Deallocates a region of host memory allocated by `host_memory_allocate`. + void (*host_memory_deallocate)(const SP_Device* device, void* mem); + + // Allocates unified memory space of the given size, if supported. Unified + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void* (*unified_memory_allocate)(const SP_Device* device, uint64_t size); + + // Deallocates unified memory space previously allocated with + // `unified_memory_allocate`. Unified + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void (*unified_memory_deallocate)(const SP_Device* device, void* location); + + // Fills SP_AllocatorStats with allocator statistics, if it is available. + // If it is not available, return false. + TF_Bool (*get_allocator_stats)(const SP_Device* device, + SP_AllocatorStats* stats); + // Fills the underlying device memory usage information, if it is + // available. If it is not available (false is returned), free/total need not + // be initialized. + TF_Bool (*device_memory_usage)(const SP_Device* device, int64_t* free, + int64_t* total); + + /*** STREAM CALLBACKS ***/ + // Creates SP_Stream. This call should also allocate stream + // resources on the underlying platform and initializes its + // internals. + void (*create_stream)(const SP_Device* device, SP_Stream* stream, + TF_Status* status); + + // Destroys SP_Stream and deallocates any underlying resources. + void (*destroy_stream)(const SP_Device* device, SP_Stream stream); + + // Causes `dependent` to not begin execution until `other` has finished its + // last-enqueued work. + void (*create_stream_dependency)(const SP_Device* device, SP_Stream dependent, + SP_Stream other, TF_Status* status); + + // Without blocking the device, retrieve the current stream status. + void (*get_stream_status)(const SP_Device* device, SP_Stream stream, + TF_Status* status); + + /*** EVENT CALLBACKS ***/ + // Create SP_Event. Performs platform-specific allocation and initialization + // of an event. + void (*create_event)(const SP_Device* device, SP_Event* event, + TF_Status* status); + + // Destroy SE_Event and perform any platform-specific deallocation and + // cleanup of an event. + void (*destroy_event)(const SP_Device* device, SP_Event event); + + // Requests the current status of the event from the underlying platform. + SE_EventStatus (*get_event_status)(const SP_Device* device, SP_Event event); + // Inserts the specified event at the end of the specified stream. + void (*record_event)(const SP_Device* device, SP_Stream stream, + SP_Event event, TF_Status* status); + + // Wait for the specified event at the end of the specified stream. + void (*wait_for_event)(const SP_Device* const device, SP_Stream stream, + SP_Event event, TF_Status* const status); + + /*** TIMER CALLBACKS ***/ + // Creates SP_Timer. Allocates timer resources on the underlying platform + // and initializes its internals, setting `timer` output variable. Sets + // values in `timer_fns` struct. + void (*create_timer)(const SP_Device* device, SP_Timer* timer, + TF_Status* status); + + // Destroy timer and deallocates timer resources on the underlying platform. + void (*destroy_timer)(const SP_Device* device, SP_Timer timer); + + // Records a start event for an interval timer. + void (*start_timer)(const SP_Device* device, SP_Stream stream, SP_Timer timer, + TF_Status* status); + + // Records a stop event for an interval timer. + void (*stop_timer)(const SP_Device* device, SP_Stream stream, SP_Timer timer, + TF_Status* status); + + /*** MEMCPY CALLBACKS ***/ + // Enqueues a memcpy operation onto stream, with a host destination location + // `host_dst` and a device memory source, with target size `size`. + void (*memcpy_dtoh)(const SP_Device* device, SP_Stream stream, void* host_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); + + // Enqueues a memcpy operation onto stream, with a device destination + // location and a host memory source, with target size `size`. + void (*memcpy_htod)(const SP_Device* device, SP_Stream stream, + SP_DeviceMemoryBase* device_dst, const void* host_src, + uint64_t size, TF_Status* status); + + // Enqueues a memcpy operation onto stream, with a device destination + // location and a device memory source, with target size `size`. + void (*memcpy_dtod)(const SP_Device* device, SP_Stream stream, + SP_DeviceMemoryBase* device_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); + + // Blocks the caller while a data segment of the given size is + // copied from the device source to the host destination. + void (*sync_memcpy_dtoh)(const SP_Device* device, void* host_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); + + // Blocks the caller while a data segment of the given size is + // copied from the host source to the device destination. + void (*sync_memcpy_htod)(const SP_Device* device, + SP_DeviceMemoryBase* device_dst, + const void* host_src, uint64_t size, + TF_Status* status); + + // Blocks the caller while a data segment of the given size is copied from the + // device source to the device destination. + void (*sync_memcpy_dtod)(const SP_Device* device, + SP_DeviceMemoryBase* device_dst, + const SP_DeviceMemoryBase* device_src, uint64_t size, + TF_Status* status); + + // Causes the host code to synchronously wait for the event to complete. + void (*block_host_for_event)(const SP_Device* device, SP_Event event, + TF_Status* status); + + // [Optional] + // Causes the host code to synchronously wait for operations entrained onto + // stream to complete. Effectively a join on the asynchronous device + // operations enqueued on the stream before this program point. + // If not set, then corresponding functionality will be implemented + // by registering an event on the `stream` and waiting for it using + // `block_host_for_event`. + void (*block_host_until_done)(const SP_Device* device, SP_Stream stream, + TF_Status* status); + + // Synchronizes all activity occurring in the StreamExecutor's context (most + // likely a whole device). + void (*synchronize_all_activity)(const SP_Device* device, TF_Status* status); + + // Enqueues on a stream a user-specified function to be run on the host. + // `callback_arg` must be passed as the first argument to `callback_fn`. + TF_Bool (*host_callback)(SP_Device* device, SP_Stream stream, + SE_StatusCallbackFn callback_fn, void* callback_arg); +} SP_StreamExecutor; + +#define SP_STREAMEXECUTOR_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_StreamExecutor, host_callback) + +typedef struct SE_CreateStreamExecutorParams { + size_t struct_size; + void* ext; // reserved for future use + + SP_StreamExecutor* stream_executor; // output, to be filled by plugin +} SE_CreateStreamExecutorParams; + +#define SE_CREATE_STREAM_EXECUTOR_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateStreamExecutorParams, stream_executor) + +typedef struct SP_Allocator { + size_t struct_size; + void* ext; // free-form field set by plugin. + + // Whether this platform supports unified memory. + // Unified memory is a single memory address space accessible from any device. + TF_Bool supports_unified_memory; +} SP_Allocator; + +#define SP_ALLOCATOR_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_Allocator, supports_unified_memory) + +typedef struct SP_AllocatorFns { + size_t struct_size; + void* ext; // reserved for future use. + + // Synchronously allocates `size` bytes on the underlying platform and returns + // `SP_DeviceMemoryBase` representing that allocation. In the case of failure, + // nullptr is returned. + // `memory_space` is reserved for a potential future usage and should be set + // to 0. + void (*allocate)(const SP_Device* device, const SP_Allocator* allocator, + uint64_t size, int64_t memory_space, + SP_DeviceMemoryBase* mem); + + // Deallocate the device memory previously allocated via this interface. + // Deallocation of a nullptr-representative value is permitted. + void (*deallocate)(const SP_Device* device, const SP_Allocator* allocator, + SP_DeviceMemoryBase* memory); + + // Allocates a region of host memory and registers it with the platform API. + // Memory allocated in this manner is required for use in asynchronous memcpy + // operations, such as `memcpy_dtoh`. + void* (*host_memory_allocate)(const SP_Device* device, + const SP_Allocator* allocator, uint64_t size); + + // Deallocates a region of host memory allocated by `host_memory_allocate`. + void (*host_memory_deallocate)(const SP_Device* device, + const SP_Allocator* allocator, void* mem); + + // Allocates unified memory space of the given size, if supported. Unified + + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void* (*unified_memory_allocate)(const SP_Device* device, + const SP_Allocator* allocator, + uint64_t bytes); + + // Deallocates unified memory space previously allocated with + // `unified_memory_allocate`. Unified + // memory support should be added by setting `supports_unified_memory` field + // in `SP_Platform`. + void (*unified_memory_deallocate)(const SP_Device* device, + const SP_Allocator* allocator, + void* location); + + // Fills SP_AllocatorStats with allocator statistics, if it is available. + // If it is not available, return false. + TF_Bool (*get_allocator_stats)(const SP_Device* device, + const SP_Allocator* allocator, + SP_AllocatorStats* stats); + + // Fills the underlying device memory usage information, if it is + // available. If it is not available (false is returned), free/total need not + // be initialized. + TF_Bool (*device_memory_usage)(const SP_Device* device, + const SP_Allocator* allocator, int64_t* free, + int64_t* total); +} SP_AllocatorFns; + +#define SP_ALLOCATOR_FNS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_AllocatorFns, device_memory_usage) + +typedef struct SP_CustomAllocator { + size_t struct_size; + void* ext; // free-form data set by plugin +} SP_CustomAllocator; + +#define SP_CUSTOM_ALLOCATOR_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_CustomAllocator, ext) + +typedef struct SP_CustomAllocatorFns { + size_t struct_size; + void* ext; // reserved for future use + + // Synchronously allocates `size` bytes on the underlying platform and returns + // a pointer to that allocation. In the case of failure, + // nullptr is returned. + void* (*allocate_raw)(const SP_Device* device, + const SP_CustomAllocator* allocator, size_t size, + size_t alignment); + + // Deallocate the device memory previously allocated via `allocate_raw`. + // Deallocation of a nullptr-representative value is permitted. + void (*deallocate_raw)(const SP_Device* device, + const SP_CustomAllocator* allocator, void* ptr); + + // Allocates a region of host memory. + void* (*host_allocate_raw)(const SP_Device* device, + const SP_CustomAllocator* allocator, + uint64_t size); + + // Deallocates a region of host memory allocated by `host_allocate_raw`. + void (*host_deallocate_raw)(const SP_Device* device, + const SP_CustomAllocator* allocator, void* mem); + + // Fills SP_AllocatorStats with allocator statistics, if it is available. + // If it is not available, return false. + TF_Bool (*get_allocator_stats)(const SP_Device* device, + const SP_CustomAllocator* allocator, + SP_AllocatorStats* stats); + + // Fills the underlying device memory usage information, if it is + // available. If it is not available (false is returned), free/total need not + // be initialized. + TF_Bool (*device_memory_usage)(const SP_Device* device, + const SP_CustomAllocator* allocator, + int64_t* free, int64_t* total); +} SP_CustomAllocatorFns; + +#define SP_CUSTOM_ALLOCATOR_FNS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_CustomAllocatorFns, device_memory_usage) + +typedef struct SE_CreateAllocatorParams { + size_t struct_size; + void* ext; // reserved for future use + + SP_Allocator* allocator; + SP_AllocatorFns* allocator_fns; +} SE_CreateAllocatorParams; + +#define SE_CREATE_ALLOCATOR_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateAllocatorParams, allocator_fns) + +typedef struct SE_CreateCustomAllocatorParams { + size_t struct_size; + void* ext; // reserved for future use + + SP_CustomAllocator* custom_allocator; + SP_CustomAllocatorFns* custom_allocator_fns; +} SE_CreateCustomAllocatorParams; + +#define SE_CREATE_CUSTOM_ALLOCATOR_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_CreateCustomAllocatorParams, custom_allocator_fns) + +typedef struct SP_Platform { + size_t struct_size; + + void* ext; // free-form data set by plugin + + // Platform name. Must be null-terminated. + const char* name; + + // Device type name, for example GPU. Must be null-terminated. + const char* type; + + // Number of visible devices. + size_t visible_device_count; +} SP_Platform; + +#define SP_PLATFORM_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_Platform, visible_device_count) + +typedef struct SP_PlatformFns { + size_t struct_size; + + void* ext; // reserved for future use + + // Callbacks for creating/destroying SP_Device. + void (*create_device)(const SP_Platform* platform, + SE_CreateDeviceParams* params, TF_Status* status); + + // Clean up fields inside SP_Device that were allocated + // by the plugin. `device` itself should not be deleted here. + void (*destroy_device)(const SP_Platform* platform, SP_Device* device); + + // Callbacks for creating/destroying SP_StreamExecutor. + void (*create_stream_executor)(const SP_Platform* platform, + SE_CreateStreamExecutorParams* params, + TF_Status* status); + // Clean up fields inside SP_StreamExecutor that were allocated + // by the plugin. `stream_executor` itself should not be deleted here. + void (*destroy_stream_executor)(const SP_Platform* platform, + SP_StreamExecutor* stream_executor); + + // Callbacks for creating/destroying SP_TimerFns. + void (*create_timer_fns)(const SP_Platform* platform, SP_TimerFns* timer, + TF_Status* status); + + void (*destroy_timer_fns)(const SP_Platform* platform, + SP_TimerFns* timer_fns); + + // Set only one of `create_allocator` or `create_custom_allocator` functions + // below. + + // Callback for creating an allocator that uses default TensorFlow allocation + // strategy (BFC: best-fit with coalescing). For more details, see + // https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/common_runtime/bfc_allocator.h. + // If `create_allocator` is set, then `create_custom_allocator` should *not* + // be set. + void (*create_allocator)(const SP_Platform* platform, + SE_CreateAllocatorParams* params, TF_Status* status); + void (*destroy_allocator)(const SP_Platform* platform, + SP_Allocator* allocator, + SP_AllocatorFns* allocator_fns); + + // Callback for creating a custom allocator. Allows using a custom allocation + // strategy. + // If `create_custom_allocator` is set, then `create_allocator` should *not* + // be set. + // Note: deallocator functions must be set in params. + void (*create_custom_allocator)(const SP_Platform* platform, + SE_CreateCustomAllocatorParams* params, + TF_Status* status); + void (*destroy_custom_allocator)(const SP_Platform* platform, + SP_CustomAllocator* allocator, + SP_CustomAllocatorFns* allocator_fns); +} SP_PlatformFns; + +#define SP_PLATFORM_FNS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SP_PlatformFns, destroy_timer_fns) + + +typedef struct SE_PlatformRegistrationParams { + size_t struct_size; + void* ext; // reserved for future use + + // StreamExecutor C API version. + int32_t major_version; + int32_t minor_version; + int32_t patch_version; + + SP_Platform* platform; // output, set by plugin + SP_PlatformFns* platform_fns; // output, set by plugin + // Clean up fields inside SP_Platform that were allocated + // by the plugin. `platform` itself should not be deleted here. + void (*destroy_platform)(SP_Platform* platform); // out, set by plugin + void (*destroy_platform_fns)( + SP_PlatformFns* platform_fns); // out, set by plugin +} SE_PlatformRegistrationParams; + +#define SE_PLATFORM_REGISTRATION_PARAMS_STRUCT_SIZE \ + TF_OFFSET_OF_END(SE_PlatformRegistrationParams, destroy_platform_fns) + +void SE_InitPlugin(SE_PlatformRegistrationParams* params, TF_Status* status); + +#ifdef __cplusplus +} // extern "C" +#endif +``` + +### PlatformId + +StreamExecutor [Platform](https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/stream_executor/platform.h;l=114) has an id parameter. This parameter will be hidden from the C API and set +internally by TensorFlow instead. + + +## Usage Example +Code example for [PluggableDevice](https://github.com/tensorflow/community/pull/262) +registration outlined in the [Usage Overview](#Usage overview) section. + +### Core TensorFlow +```cpp +typedef void (*SEInitPluginFn)(SE_PlatformRegistrationParams*, TF_Status*); +... + +// On Windows, use `GetProcAddress` instead of `dlsym`. +void* initialize_sym = dlsym(plugin_dso_handle, "SE_InitPlugin"); +if (!initialize_sym) { + // Output error and skip this plug-in. +} +SEInitPluginFn initialize_fn = reinterpret_cast(initialize_sym); + +SE_PlatformRegistrationParams params; +TF_Status status; + +initialize_fn(¶ms, &status); + +// Register new platform +std::unique_ptr platform( + new stream_executor::internal::CPlatform(params)); +SE_CHECK_OK( + stream_executor::MultiPlatformManager::RegisterPlatform( + std::move(platform))); + +// Register PluggableDevice +std::string platform_name_str(params.platform->name); +std::string type_str(params.platform->type); +DeviceFactory::Register(type_str, new PluggableDeviceFactory(platform_name_str), + priority); +... +``` + +### Plug-in +Define functions that create and destroy `SP_Device`, `SP_StreamExecutor`, and +`SP_TimerFns`: + +```cpp +void create_device(const SP_Platform* platform, SE_CreateDeviceParams* params, + TF_Status* status) { + params->device->device_handle = get_my_device_handle(); + ... +} +void create_stream_executor(const SP_Platform* platform, + SE_CreateStreamExecutorParams* params, + TF_Status* status) { + params->stream_executor->memcpy_htod = my_device_memcpy_from_host_function; + ... +} +void create_timer_fns(const SP_Platform* platform, SP_TimerFns* timer_fns, + TF_Status* status) { + timer_fns->nanoseconds = nanoseconds; + ... +} +void create_allocator(const SP_Platform* platform, SP_CreateAllocatorParams* params, + TF_Status* status) { + ... +} +void destroy_device(const SP_Platform* platform, SP_Device* device) { + // Destroy device handle here. +} +void destroy_stream_executor(const SP_Platform* platform, + SP_StreamExecutor* se) { + // Perform any clean up needed for stream executor. +} +void destroy_timer_fns(const SP_Platform* platform, SP_TimerFns* timer_fns) { + // Destroy timer functions here. +} +void destroy_allocator(const SP_Platform* platform, SP_Allocator* allocator, SP_AllocatorFns* allocator_fns) { + // Clean up allocator here. +} +``` + +Define `SE_InitPlugin` that TensorFlow will call when registering the device +plug-in: + +```cpp +void SE_InitPlugin(SE_PlatformRegistrationParams* params, TF_Status* status) { + int32_t visible_device_count = 2; + std::string name = "MyDevice"; + std::string type = "GPU"; + + // Sets struct_size to a valid value, and zero initializes other attributes. + *params = { SE_PLATFORM_REGISTRATION_PARAMS_STRUCT_SIZE }; + params->platform->name = name.c_str(); + params->platform->type = type.c_str(); + params->platform->visible_device_count = visible_device_count; + params->platform_fns->create_device = create_device; + params->platform_fns->destroy_device = destroy_device; + params->platform_fns->create_stream_executor = create_stream_executor; + params->platform_fns->destroy_stream_executor = destroy_stream_executor; + params->platform_fns->create_timer_fns = create_timer_fns; + params->platform_fns->destroy_timer_fns = destroy_timer_fns; + params->platform_fns->create_allocator = create_allocator; + params->platform_fns->destroy_allocator = destroy_allocator; +} +``` + +## Stream / Timer / Event Representation + +API extension would require defining `SP_Stream_st`, `SP_Event_st`, and +`SP_Timer_st` structs. From the point of view of TensorFlow, we will treat their +pointers as opaque. + +Underneath, StreamExecutor will rely on customized implementations of +[StreamInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=114), +[TimerInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=145) +and +[EventInterface](https://cs.opensource.google/tensorflow/tensorflow/+/refs/tags/v2.3.0:tensorflow/stream_executor/stream_executor_internal.h;l=76). +For example, Stream customization might look as follows: + +```cpp +class CStream : public StreamInterface { + public: + explicit CStream(SP_Device* device, + SP_StreamExecutor* stream_executor) : + device_(device), stream_executor_(stream_executor), + stream_handle_(nullptr) { + } + ~CStream() override { + Destroy(); + } + + void Init() { + stream_handle_ = stream_executor_->create_stream(device_); + } + + void Destroy() { + if (stream_handle_ != nullptr) { + stream_executor_->delete_stream(device_, stream_handle_); + stream_handle_ = nullptr; + } + } + + SP_Stream Handle() { + return stream_handle_; + } + + private: + SP_Device* device_; // not owned + SP_StreamExecutor* stream_executor_; // not owned + SP_Stream stream_handle_; +}; +``` + +## Alternatives Considered + +* **Forking:** Contributors could always fork the TensorFlow repository, + directly make changes there to add a device, and release custom TensorFlow + packages. However, keeping forked copy in sync with the main repository can + be challenging and tedious, especially if some breakages cannot be fixed and + the code diverges. +* **Designing a new C API instead of StreamExecutor:** We are transitioning to + the new TensorFlow stack soon. Since the current stack’s code might not be + compatible with the new stack, we decided to stick with the existing + StreamExecutorInterface to minimize throw-away efforts. + +## Performance Implications + +The C API should not affect TensorFlow’s performance. Using the C API to connect +a device modularly would help save build time (compared to adding code directly +to the repository.) + +## Dependencies + +* This proposal doesn’t add any new dependencies to TensorFlow. +* This proposal doesn’t affect any projects dependent on TensorFlow. + +## Engineering Impact + +* The C API would increase the binary size and the build time, but not + significantly so. We don’t expect it to affect startup time / test times. +* The TensorFlow DevInfra team will maintain this code. StreamExecutor C API + will be packaged along with other C APIs that TensorFlow currently has. + +## Platforms and Environments + +* **Platforms:** The C API should work on all platforms supported by + TensorFlow, apart from embedded/mobile platforms. It does not impact + automatic code generation or mobile stripping tooling. We don’t expect it to + interact with transformation tools. +* **Execution environments:** The C API should work on any standard execution + environments. + +## Best Practices + +* Going forward, Modular TensorFlow will be the only way to integrate new + third-party devices to the current TensorFlow stack. +* For device integrations that can be done in 2021 or later, we strongly + encourage waiting to integrate with the new TensorFlow stack instead. + +## Compatibility + +How will this proposal interact with other parts of the TensorFlow Ecosystem? + +* **TFLite:** We don’t plan to make this work for TFLite. +* **Distribution strategies:** The C API should not impede them. +* **tf.function:** The C API would not interact with tf.function. +* **GPU/TPU:** Certain GPUs and TPUs are already supported in TensorFlow and + wouldn’t need this C API. Other GPU/devices can use this C API if the + functionality coverage is sufficient for them. +* **SavedModel:** The C API will not be serialized to a SavedModel. + +## Questions and Discussion Topics + +* Any comments on the API design? Any missing functionality? +* Please let us know if you plan to use this C API for device integration. diff --git a/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md b/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md new file mode 100644 index 000000000..de922bdca --- /dev/null +++ b/rfcs/20200612-stream-executor-c-api/C_API_versioning_strategy.md @@ -0,0 +1,384 @@ +# StreamExecutor C API Versioning Strategy +| Status | Proposed | +| :------------ | :------------------------------------------------------ | +| **RFC #** | Extension of #[257](https://github.com/tensorflow/community/pull/257) | +| **Authors** | Yi Situ (yisitu@google.com), Penporn Koanantakool (penporn@google.com), Anna Revinskaya (annarev@google.com) | +| **Sponsor** | Gunhan Gulsoy (gunan@google.com) | +| **Updated** | 2020-09-08 | + +In reply to a question on [PR #262](https://github.com/tensorflow/community/pull/262#issuecomment-653690654). + +StreamExecutor C API (SE C API) follows Semantic Versioning 2.0.0 +([semver](http://semver.org/)). Each release version has a format +`MAJOR.MINOR.PATCH`, as outlined in [TensorFlow version compatibility](https://www.tensorflow.org/guide/versions#semantic_versioning_20). +We also use `struct_size` to track compatibility. + +## Updating Guidelines +This section outlines when to update version numbers specific to SE C API +(`SE_MAJOR`, `SE_MINOR`, and `SE_PATCH`). + +### SE_MAJOR +* Potentially backwards incompatible changes. +* If a change is backwards incompatible, it requires an RFC because it will + break all current plug-ins. This should be rare. +* An `SE_MAJOR` update should be planned in a way that bundles as many pending + backwards incompatible changes together as possible to avoid breaking plug-ins + multiple times. +* There will be an announcement giving a grace period before the update happens. + +### SE_MINOR +* Backwards compatible changes. + * Adding a new variable, struct, method, enumeration, etc. + * Trivial deprecation of a variable, etc. by setting it to a no-op values, + e.g., 0 or `NULL`. + +### SE_PATCH +* Backwards compatible bug fixes. + +## Conventions +* Once a member is added to a struct, it cannot be removed, reordered, renamed, + or repurposed (i.e., assigned a different functionality). +* "Renaming" a member is equivalent to adding a new member with a new name and + eventually deprecating the member with the old name. +* Fields that cannot be 0 or `NULL` can be deprecated in a backwards compatible + manner by zero-initialization. + * If the field is set by core TensorFlow, plug-ins must perform input validation + on these fields for 0 and `NULL` before accessing them. + * Plug-ins know the fields are deprecated when they find 0 or `NULL` in + these fields. + * If the field is set by plug-in, TF can check if the field is non-zero (or not + `NULL`) and print a warning if so. + * Such fields must be explicitly marked by comments, to ensure all plug-ins + have consistent behavior (e.g., none of the plug-ins is using 0 or `NULL` as + a special case). See `// 0 is no-op` and `// NULL is no-op` in the + [By value inspection](#by-value-inspection) section for example. + + +## Detecting Incompatibility + +### By Comparing SE_MAJOR at Registration Time +At load time, both plug-in and core TensorFlow should check for version +compatibility. If the versions are not compatible, plug-in should output an +error and core TensorFlow should unload the plug-in. See code example below. + +Core TensorFlow passes its SE C API version number when calling plug-in's +initialization routine (`SE_InitPlugin`): +```c++ +typedef void (*SEInitPluginFn)(SE_PlatformRegistrationParams*, TF_Status*); +SE_PlatformRegistrationParams params{SE_PLATFORM_REGISTRATION_PARAMS_SIZE}; +params.major_version = SE_MAJOR; +params.minor_version = SE_MINOR; +params.patch_version = SE_PATCH; +TF_Status status; + +// Core TensorFlow sends its view of version numbers to plugin. +void* initialize_sym = dlsym(plugin, "SE_InitPlugin"); +if (!initialize_sym) { + // Output error and skip this plug-in. +} +SEInitPluginFn initialize_plugin_fn = reinterpret_cast(initialize_sym); +initialize_plugin_fn(¶ms, &status); +if(!tensorflow::StatusFromTF_Status(status).ok()) { + // Output error and skip this plug-in. +} +``` + +Plug-in checks the `SE_MAJOR` version numbers and outputs error if they don't +match: +```c++ +void SE_InitPlugin(SE_PlatformRegistrationParams* params, + TF_Status* status) { + if (params->struct_size == 0) { + // *status = ... + LOG(ERROR) << "Invalid argument."; + return; + } + if (SE_MAJOR != params->major) { + // *status = ... + LOG(ERROR) << "Unsupported major version. Given: " << params->major + << " Expected: " << SE_MAJOR; + return; + } + ... +} +``` + +### By Value Inspection +Deprecation of an attribute can sometimes be done in a backwards compatible +manner by leaving the attribute zero initialized. + +* The plugin performs input validation on each field for `NULL` or 0 value + before consuming it, preventing it from entering a bad state. +* If deprecation by zero-initialization is not possible (e.g., because default + value of zero may be a valid input), then the change is API incompatible; + TensorFlow has to bump the major version when the attribute is deprecated. + +For example, + +```c++ +struct Example { + int32_t cannot_be_zero; // 0 is no-op. + void* cannot_be_null; // NULL is no-op. + int32_t can_be_zero; + void* can_be_null; + int32_t optional_zero_default; // Optional. 0 by default. + void* optional_null_default; // Optional. NULL by default. +}; +``` +* `cannot_be_zero` and `cannot_be_null` here can be deprecated by + zero-initializing. +* `can_be_zero` and `can_be_null` need a MAJOR version bump for deprecation, + since 0 and `NULL` are valid values for them. +* `optional_zero_default` and `optional_null_default` are optional fields that + use 0 / `NULL` to indicate that the field is not provided. This needs an + `SE_MAJOR` version bump for deprecation as well, since 0 and `NULL` are valid + here. + +For other unintentional changes which are caused by bugs (e.g., data was +forgotten to be initialized by mistake), file a Github issue. + +### By Checking Struct Size +Backwards compatible changes within the same `SE_MINOR` version can only add new +members to a struct and cannot modify any existing member. Because of this, we +can check the byte offset of the variable we want to consume against the struct +size to see if the struct has the variable or not. + +# Usage Example + +Following are concrete examples of how TensorFlow remains compatible with +plug-ins when functionality is added to or removed from StreamExecutorInterface. + +## Extending Functionality +The following snippet shows `void* new_field1` and `int new_field2` being added +to a `Toy` struct. + +```diff +#define SE_MAJOR 1 +- #define SE_MINOR 0 ++ #define SE_MINOR 1 // Increment minor version. +#define SE_PATCH 0 + +typedef struct Toy { + size_t struct_size; + void* ext; // Free-form data set by plugin. + int32_t old_field; // Device index. ++ void* new_field1; // NULL is no-op. ++ int new_field2; // 0 is no-op. +} Toy; + +- // Evaluates to 20 +- #define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, old_field) ++ // Evaluates to 36 ++ #define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, new_field2) +``` + +To concisely cover compatibility of cases where structs are created by core +TensorFlow and by plug-ins, we will call the side that creates the struct +`producer`, and the side that takes the struct `consumer`. + +### Producer Has Older Header Files + +```cpp +// Producer implementation has v1.0.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + // Based on header v1.0.0, toy->struct_size is 20. + ... + old_field = set_old_field(); + return toy; +} +// Consumer implementation has v1.1.0 headers. +void take_toy(const Toy* toy) { + // Consumer checks for `struct_size` greater than 24 (offset of `new_field1`). + // In this case, `toy->struct_size` = 20 so this `if` is not entered. + if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { + // Safe to access `new_field1`. + } + // Consumer checks for `struct_size` greater than 32 (offset of `new_field2`). + // In this case, `toy->struct_size` = 20 so this `if` is not entered. + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. + } +} +``` + +### Producer Has Newer Header Files + +```cpp +// Producer implementation has v1.1.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + // Based on header v1.1.0, toy->struct_size is 36. + ... + old_field = set_old_field(); + new_field1 = set_new_field1(); + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.0.0 headers. +void take_toy(const Toy* toy) { + // `new_field1` and `new_field2` are safely ignored + // because consumer doesn't know about them. +} +``` + +If `producer` depends on `consumer` knowing about `new_field1` and `new_field2`, +adding `new_field1` and `new_field2` would be a backwards incompatible change +and `SE_MAJOR` should be bumped instead. + +## Deprecating Functionality + +When functionality is being deprecated, there will be comments next to the +member indicating so. The member is left in place to preserve the alignment and +offset of the existing structure members. General guidelines: +* Add comments saying which field will be deprecated. +* The minor update will still support `deprecating_feature` to allow time for + transition. This would be a good time to raise concerns on Github. +* After the transition time has passed, `deprecating_feature` can be removed in + a major update. + +Since members are not allowed to be removed or reordered, refactors (e.g., +renaming device_handle to dev_handle) or changing of member types (e.g., from +`int` to `float`) are considered as +[deprecation with extension](#Deprecation-with-extension). + +The following code snippet shows deprecation of `new_field1`. +```diff +#define SE_MAJOR 1 +- #define SE_MINOR 1 ++ #define SE_MINOR 2 // Increment minor version. +#define SE_PATCH 0 + +typedef struct Toy { + size_t struct_size; + void* ext; // Free-form data set by plugin. + int32_t old_field; // Device index. +- void* new_field1; // NULL is no-op. ++ void* new_field1; // Deprecated. // NULL is no-op. + int new_field2; // 0 is no-op. +} Toy; + +// Evaluates to 36 +#define TOY_STRUCT_SIZE TF_OFFSET_OF_END(SE_Device, new_field2) +``` + +To concisely cover compatibility of cases where structs are created by core +TensorFlow and by plug-ins, we will call the side that creates the struct +`producer`, and the side that takes the struct `consumer`. + +### Producer Has Older Header Files + +```diff +// Producer implementation has v1.1.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; + ... + old_field = set_old_field(); + new_field1 = set_new_field1(); + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.2.0 headers. +void take_toy(const Toy* toy) { +- // Consumer removes the code using `new_field1`. +- if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { +- // Safe to access `new_field1`. +- } + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. + } +} +``` + +The producer, being older, initializes the recently deprecated `new_field1`. +Since consumer's `take_toy` does not access it anymore, `new_field1` will be +safely ignored (even though it was initialized). + +### Producer Has Newer Header Files + +```diff +// Producer implementation has v1.2.0 headers. +Toy* create_toy() { + Toy* toy = new Toy{TOY_STRUCT_SIZE}; ++ // `new_field1` is zero-initialized with the line above. + ... + old_field = set_old_field(); +- new_field1 = set_new_field1(); // Stops setting the deprecated `new_field1`. + new_field2 = set_new_field2(); + return toy; +} +// Consumer implementation has v1.1.0 headers. +void take_toy(const Toy* toy) { ++ // `new_field1` is `NULL` so it is safely ignored. ++ // Can also add code to raise an error here when `NULL` is detected. + if (toy->struct_size > offsetof(Toy, new_field1) && new_field1 != NULL) { + // Safe to access `new_field1`. + } + if (toy->struct_size > offsetof(Toy, new_field2) && new_field2 != 0) { + // Safe to access `new_field2`. + } +} +``` + +This way, plug-ins can safely remove implementation of deprecated functionality. + +## Deprecation with Extension + +This is the more common form of deprecation where the struct is extended with a +new attribute that replaces an existing one. The analysis is the same as +[Extending functionality](#Extending-functionality) and +[Deprecating functionality](#Deprecating-functionality) combined. +General guidelines: +* Add comments saying which field will be deprecated and which one will replace + it. +* Increment the minor version. +* The minor update will support both `name` and `better_name` to allow time for + transition. This would be a good time to raise concerns on Github. +* After the transition time has passed, `name` can be removed in a major update. + +Below are some examples. + +```diff +#define SE_MAJOR 5 +- #define SE_MINOR 0 ++ #define SE_MINOR 1 // Increment minor version +#define SE_PATCH 0 + +// Case 1 - Renaming an attribute +typedef struct Device { + size_t struct_size; + void* ext; + int32_t ordinal; + +- const char* name; ++ const char* name; // Deprecating soon. Use `better_name`. + void* device_handle; + const char* better_name; // Replaces `name`. +} Device; + + +// Case 2 - Deprecation of an entire struct can be done without a replacement... ++ // `Device` struct will be deprecated soon. +typedef struct Device { +... +} Device; + +// ...or with a replacement ++ // Replaces `Device`. ++ typedef struct BetterDevice { ++ ... ++ } Device; + +// Case 3 - Renaming a function. +typedef struct ExportFunctions { +... ++ // create_device will be deprecated soon. + void (*create_device)(Device* device); + ++ // Replaces `create_device`. ++ void (*create_better_device)(BetterDevice* device); +} ExportFunctions; +``` + +# Limitations +* Maximum supported alignment is 8 bytes.