cudnnMultiHeadAttention This is a draft implementation of the formula softmax(QK^T/sqrt(d_k))V. The reference paper is "Attention is All You Need" (https://arxiv.org/abs/1706.03762).