cudnn_frontend v0.5
Release Notes:
- [New API]: Execution Plan Caching
## Execution Plan Caching
cuDNN through heuristics provides a way to query a list of good engine configs. Based on this query we build the cudnn_frontend_find_plan function which runs all the engineConfig(s) on the given user system and returns a sorted list of plans.
This process of running multiple plans through several iterations is time consuming. The ExecutionPlanCache allows the user to build a cache with operation graph as the key to query an execution plan. It is the responsibilty of the user to maintain different ca ches for different types of operation_graphs (For eg. different cache for convolutionForward compared to Dgrad or Wgrad).
### API:
- add_plan_to_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan &plan) : Creates a mapping between the operation graph and executionPlan
- bool get_plan_from_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan *&plan) : Sets the executionPlan in the plan pointer and returns true if found.
- cudnnFindPlanAndCache(cudnnHandle_t handle, cudnn_frontend::OperationGraph &opGraph, cudnn_frontend::VariantPack const &variantPack, cudnn_frontend::ExecutionPlanCache &cache, Predicate pred) -> cudnn_frontend::ExecutionPlan : The above API chains the output of cudnn_frontend_find_plan and caches the result for future usage.
- [New Feature]: Allows logging in the cudnn frontend.
## Logging
cuDNN Frontend API logging records execution flow through cuDNN frontend API. This functionality is disabled by default, and can be enabled through methods described in this section.
### Method 1: Using Environment Variables:
| Environment variables | CUDNN_FRONTEND_LOG_INFO=0 | CUDNN_FRONTEND_LOG_INFO=1 |
| --------------------------------------------------| ------------------------- | ----------- |
| CUDNN_FRONTEND_LOG_FILE not set | No Logging | No Logging |
| CUDNN_FRONTEND_LOG_FILE set to stdout or stderr | No Logging | Logging to cout or cerr |
| CUDNN_FRONTEND_LOG_FILE set to filename.txt | No Logging | Logging to the filename |
### Method 2: Using API calls:
Calling `cudnn_frontend::isLoggingEnabled() = true|false` has same effect of setting the environment variable.
Calling `cudnn_frontend::getStream() = stream_name` can be used to assign the output stream directly.
- [New API]: cudnnReorderFilterAndBiasInt8x32 :- Reorders the filter and bias tensors which allows the tensor cores to be used during Int8x32 convolutions
- [New Feature]: Add support for isByValue attribute setting in tensor.
- [Samples]: Clean up Makefile and move to cmake based setup. Allows samples to be compiled on Windows machines.
- [Samples]: Updated samples to query the heuristics for fusion cases.
- [Samples]: Added a new ConvScaleBiasAct_int8 sample to address https://github.com/NVIDIA/cudnn-frontend/issues/8
- [Samples]: Added a sample to demonstrate how execution Plan caching works.
- [Samples]: Added a new sample to show how Multi-Headed Attention can be implemented with run time fusion. Will work with 8.3.1
- [Cleanup]: ExecutionPlan cache has copy contructor and pre-emptively caches the workspace, numerical and behavior notes.
- [Cleanup]: Update cudnn_frontend_PointWiseDesc.h to include limits to fix gcc 11 compilation error.( https://github.com/NVIDIA/cudnn-frontend/pull/10)
- [Cleanup]: Verify out of bounds iterator in Errata.h (https://github.com/NVIDIA/cudnn-frontend/pull/11)
- [Cleanup]: Added default move assign and move constructor to all classes.
- [Cleanup]: CheckcudaError and checkCudnnError correctly asserts now instead of silently failing.
- [Cleanup]: Updated errata filter to no-longer block the engine ID 0 when running the Int8x32.
- [Cleanup]: Default value of knobs in engine config is not 0 anymore.