Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutlass update #242

Merged
merged 11 commits into from
Jun 25, 2024
92 changes: 50 additions & 42 deletions util/job_launching/apps/define-all-apps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -511,51 +511,59 @@ cutlass_5_trace:
exec_dir: "$GPUAPPS_ROOT/bin/$CUDA_VERSION/release/"
data_dirs: "$GPUAPPS_ROOT/data_dirs/"
execs:
- cutlass_perf_test:
- args: --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 3G
- args: --seed=2020 --dist=0 --m=2560 --n=32 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 3G
- args: --seed=2020 --dist=0 --m=2560 --n=64 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 3G
- args: --seed=2020 --dist=0 --m=2560 --n=128 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 3G
- args: --seed=2020 --dist=0 --m=2560 --n=7000 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 3G
- args: --seed=2020 --dist=0 --m=4096 --n=16 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- args: --seed=2020 --dist=0 --m=4096 --n=32 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- args: --seed=2020 --dist=0 --m=4096 --n=64 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- args: --seed=2020 --dist=0 --m=4096 --n=128 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- args: --seed=2020 --dist=0 --m=4096 --n=7000 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- args: --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
- cutlass_profiler:
#single precision gemm kernels
- args: --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=32 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
- args: --seed=2020 --dist=0 --m=2560 --n=32 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=64 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
# - args: --seed=2020 --dist=0 --m=2560 --n=64 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --m=2560 --n=128 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --m=2560 --n=512 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --m=2560 --n=1024 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --m=2560 --n=2560 --k=2560 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --m=4096 --n=16 --k=4096 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 16G
# - args: --seed=2020 --dist=0 --m=4096 --n=32 --k=4096 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 16G
# - args: --seed=2020 --dist=0 --m=4096 --n=64 --k=4096 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 16G
# - args: --seed=2020 --dist=0 --m=4096 --n=128 --k=4096 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 16G
# - args: --seed=2020 --dist=0 --m=4096 --n=4096 --k=4096 --kernels=sgemm --iterations=5 --providers=cutlass
# accel-sim-mem: 20G
#gemm kernels on tensor cores
- args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=16 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=128 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=512 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=1024 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=2560 --n=2560 --k=2560 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 13G
- args: --seed=2020 --dist=0 --m=4096 --n=16 --k=4096 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 16G
- args: --seed=2020 --dist=0 --m=4096 --n=32 --k=4096 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 16G
- args: --seed=2020 --dist=0 --m=4096 --n=64 --k=4096 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 16G
- args: --seed=2020 --dist=0 --m=4096 --n=128 --k=4096 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 16G
- args: --seed=2020 --dist=0 --m=4096 --n=4096 --k=4096 --kernels=sgemm_nn --iterations=5 --providers=cutlass
accel-sim-mem: 20G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=32 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=64 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=128 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=512 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=1024 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=2560 --n=2056 --k=2560 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=16 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=32 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=64 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=128 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=512 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G
# - args: --seed=2020 --dist=0 --operation=gemm --m=4096 --n=4096 --k=4096 --op_class=tensorop --iterations=5 --provider=cutlass
# accel-sim-mem: 13G

## Not sure how much memory the following apps take - just letting them go with the default

Expand Down
8 changes: 4 additions & 4 deletions util/job_launching/apps/define-power.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,13 +206,13 @@ cutlass_5_trace_validation:
data_dirs: "$ACCELSIM_ROOT/../util/accelwattch/accelwattch_benchmarks/data_dirs/"
execs:
- cutlass_perf_test_k1:
- args: --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
- args: --seed=2020 --dist=0 --m=2560 --n=16 --k=2560 --operation=gemm --op_class=tensorop --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- cutlass_perf_test_k2:
- args: --seed=2020 --dist=0 --m=4096 --n=128 --k=4096 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
- args: --seed=2020 --dist=0 --m=4096 --n=128 --k=4096 --operation=gemm --op_class=tensorop --iterations=5 --providers=cutlass
accel-sim-mem: 5G
- cutlass_perf_test_k3:
- args: --seed=2020 --dist=0 --m=2560 --n=512 --k=2560 --kernels=wmma_gemm_nn --iterations=5 --providers=cutlass
- args: --seed=2020 --dist=0 --m=2560 --n=512 --k=2560 --operation=gemm --op_class=tensorop --iterations=5 --providers=cutlass
accel-sim-mem: 5G

Deepbench_validation:
Expand Down Expand Up @@ -454,4 +454,4 @@ power_ubench:
- SHRD_TEX_SFU:
- args: 100
- TENSOR:
- args: 10
- args: 10
2 changes: 1 addition & 1 deletion util/tracer_nvbit/run_hw_trace.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
args = argpair["args"]
run_name = os.path.join( exe, common.get_argfoldername( args ) )
this_run_dir = os.path.abspath(os.path.expandvars(
os.path.join(this_directory, "..", "..", "hw_run","traces","device-" + options.device_num, cuda_version, run_name)))
os.path.join(scratch_dir, "hw_run","traces","device-" + options.device_num, cuda_version, run_name)))
this_trace_folder = os.path.join(this_run_dir, "traces")
if not os.path.exists(this_run_dir):
os.makedirs(this_run_dir)
Expand Down