Skip to content

v1.4.0 release

Compare
Choose a tag to compare
@Anerudhan Anerudhan released this 07 May 16:54
· 12 commits to main since this release
b740542

[New] Added a benchmark folder which contains a sample docker file to compare cudnn implementation of sdpa with that of the pytorch implementation.

[Enhancement] Once an engine is de-selected by name, it will not be built as part of check support.

[Enhancement] The cudnn backend search order for wheels is as follows: (a) It will dlopen libcudnn.so.MAJOR_VERSION in the site packages. (b) It will try to dlopen unversioned libcudnn.so. This way pypi cudnn package nvidia-cudnn-cu* gets priority over default search path.

[Enhancement] Allow embedding dimension up to 256 (currently limited to 128) in sdpa fprop operation.

[Bug fix] Update the scale and bias shapes in batch norm sample.