Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cuComputePartGradGammaBeta kernel for MI100 #10475

Merged
merged 3 commits into from
Feb 9, 2022

Conversation

hubertlu-tw
Copy link
Contributor

Description:
Optimized "part_size" on MI100 for layerNorm implementation (specifically for cuComputePartGradGammaBeta and cuComputeGradGammaBeta)

Motivation and Context

@hubertlu-tw
Copy link
Contributor Author

@weixingzhang Please review. Thanks.

@weixingzhang
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

@weixingzhang
Copy link
Contributor

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-amd-gpu-ci-pipeline, Linux Nuphar CI Pipeline, orttraining-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@weixingzhang
Copy link
Contributor

/azp run orttraining-ortmodule, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline, onnxruntime-binary-size-checks-ci-pipeline, ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the HIP_PLATFORM symbol is indeed defined, the precedent throughout (most) of the sources is to use the USE_ROCM symbol. There are some other places outside of this PR where the platform symbol was incorrectly used instead of USE_ROCM.

That said, if-not-else is harder to read than if-else. I would suggest also reordering your if/else statements to be

#ifdef USE_ROCM
// Optimization for ROCm MI100
#else
// no comment needed, just the original code here
#endif

@weixingzhang
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

@weixingzhang
Copy link
Contributor

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-amd-gpu-ci-pipeline, Linux Nuphar CI Pipeline, orttraining-distributed

@weixingzhang
Copy link
Contributor

/azp run orttraining-ortmodule, orttraining-ortmodule-distributed, onnxruntime-python-checks-ci-pipeline, onnxruntime-binary-size-checks-ci-pipeline, ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@weixingzhang weixingzhang merged commit c9fbd0b into microsoft:master Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants