Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* - [New Feature] Serialization: ## Execution Plan Serialization and Deserialization (Experimental) cuDNN v8.4 and above provides exeuction plan serialization and deserialization to save the execution plan as a string in JSON format. The execution plan can be then restored from that string at a later point, and this also saves compilation time compared to rebuilding the plan from scratch. Currently, this is an experimental feature that only supports the runtime fusion engine. No forward/backward or cross-device compatibility guarantee is offered at this time. ### API: - std::string cudnn_frontend::ExecutionPlan_v8::getJsonRepresentation() : Serialize the execution plan into a string in JSON format. - cudnn_frontend::ExecutionPlan_v8&& cudnn_frontend::ExecutionPlanBuilder_v8::loadFromJson(const std::string &json_plan) : Deserialize from a string containing the JSON representation of the execution plan. - [New API] Added a new API ``` get_heuristics_list(std::array<std::string, SIZE> modes, OperationGraph_v8 &opGraph, std::function<bool(cudnnBackendDescriptor_t)> filter_fn, EngineConfigList &filtered_configs, bool evaluate_all = false) ``` This function takes a paramter list of heuristics mode. "heuristics_instant", "heuristic_fallback", "heuristic_mode_b" and computes a list of engine config which do not satisfy the blocking condition in filter_fn. The function can be optionally set to keep going even if one of the mode fails. - [New Features] Added support for BN Finalize i.e. generation of mean and variance to perform batch normalization. - [New Features] Added support for BN Stats fusion pattern. This pattern covers Scale, Bias, Relu, Conv and generation of SUM and SQSUM for batch normalization. - [New Features] Added support for CUDNN_POINTWISE_GEN_INDEX and CUDNN_POINTWISE_BINARY_SELECT pointwise operations added in cuDNN 8.4.0. - [Cleanup] Fixed a bug when used CUDNN_HEUR_MODE_B is used in multiple threads leads to crash in certain conditions. - [Cleanup] Added the CUDNN_PATH in CMakeLists.txt allowing user to build with different cuDNN installation path. - [Cleanup] Made Engine_v8 constructor as default which prevents overwriting of the status during knob creation. - [Cleanup] Take UIDs of variant pack as a const pointer. - [Cleanup] When logging was enabled and if no plan returned by heuristics is finalizable, it lead to a crash. This is now fixed. - [Samples] Added a new sample to showcase CUDNN_POINTWISE_GEN_INDEX and CUDNN_POINTWISE_BINARY_SELECT pointwise operations. - [Samples] Modified MHA sample to show improved numerical stability. Investigation is still going on to further improve the MHA sample - [Samples] Added samples for fused operation graph for BN Stats generation and stats finalization. * Added missing return statements for operation. Co-authored-by: Anerudhan Gopal <[email protected]>
- Loading branch information