Skip to content

cudnn_frontend v0.5

Compare
Choose a tag to compare
@Anerudhan Anerudhan released this 12 Nov 06:30
· 43 commits to main since this release

Release Notes:

- [New API]: Execution Plan Caching
   ## Execution Plan Caching
      cuDNN through heuristics provides a way to query a list of good engine configs. Based on this query we build the cudnn_frontend_find_plan function which runs all the engineConfig(s) on the given user system and returns a sorted list of plans. 
      This process of running multiple plans through several iterations is time consuming. The ExecutionPlanCache allows the user to build a cache with operation graph as the key to query an execution plan. It is the responsibilty of the user to maintain different ca  ches for different types of operation_graphs (For eg. different cache for convolutionForward compared to Dgrad or Wgrad).                                                                                                                                   
  
      ### API:
       - add_plan_to_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan &plan) : Creates a mapping between the operation graph and executionPlan
       - bool get_plan_from_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan *&plan) : Sets the executionPlan in the plan pointer and returns true if found. 
       - cudnnFindPlanAndCache(cudnnHandle_t handle, cudnn_frontend::OperationGraph &opGraph, cudnn_frontend::VariantPack const &variantPack, cudnn_frontend::ExecutionPlanCache &cache, Predicate pred) -> cudnn_frontend::ExecutionPlan : The above API chains the output of cudnn_frontend_find_plan and caches the result for future usage.

- [New Feature]:  Allows logging in the cudnn frontend.
   ## Logging
      cuDNN Frontend API logging records execution flow through cuDNN frontend API. This functionality is disabled by default, and can be enabled through methods described in this section.

      ### Method 1: Using Environment Variables:
      | Environment variables                             | CUDNN_FRONTEND_LOG_INFO=0 | CUDNN_FRONTEND_LOG_INFO=1 |
      | --------------------------------------------------| ------------------------- | -----------               |
      | CUDNN_FRONTEND_LOG_FILE not set                   | No Logging                | No Logging                |
      | CUDNN_FRONTEND_LOG_FILE set to stdout or stderr   | No Logging                | Logging to cout or cerr   |
      | CUDNN_FRONTEND_LOG_FILE set to filename.txt       | No Logging                | Logging to the filename   |

      ### Method 2: Using API calls:
      Calling `cudnn_frontend::isLoggingEnabled() = true|false` has same effect of setting the environment variable.
      Calling `cudnn_frontend::getStream() = stream_name` can be used to assign the output stream directly.

- [New API]: cudnnReorderFilterAndBiasInt8x32 :- Reorders the filter and bias tensors which allows the tensor cores to be used during Int8x32 convolutions
- [New Feature]: Add support for isByValue attribute setting in tensor.

- [Samples]: Clean up Makefile and move to cmake based setup. Allows samples to be compiled on Windows machines.
- [Samples]: Updated samples to query the heuristics for fusion cases.
- [Samples]: Added a new ConvScaleBiasAct_int8 sample to address https://github.com/NVIDIA/cudnn-frontend/issues/8
- [Samples]: Added a sample to demonstrate how execution Plan caching works.
- [Samples]: Added a new sample to show how Multi-Headed Attention can be implemented with run time fusion. Will work with 8.3.1
- [Cleanup]: ExecutionPlan cache has copy contructor and pre-emptively caches the workspace, numerical and behavior notes.
- [Cleanup]: Update cudnn_frontend_PointWiseDesc.h to include limits to fix gcc 11 compilation error.( https://github.com/NVIDIA/cudnn-frontend/pull/10)
- [Cleanup]: Verify out of bounds iterator in Errata.h (https://github.com/NVIDIA/cudnn-frontend/pull/11)
- [Cleanup]: Added default move assign and move constructor to all classes.
- [Cleanup]: CheckcudaError and checkCudnnError correctly asserts now instead of silently failing.
- [Cleanup]: Updated errata filter to no-longer block the engine ID 0 when running the Int8x32. 
- [Cleanup]: Default value of knobs in engine config is not 0 anymore.