Fixes for atomic coalescing at L1, correlated with QV100 hardware #33

abhaumick · 2022-03-26T03:39:49Z

fixed atomic coalescing at L1
- modified warp_inst_t::memory_coalescing_arch_atomic()
- behavior correlated with QV100 hardware
modified trace.h
- added DPRINTF_RAW() to allow prints without gpu_sim_cycle
- for prints from classes that do not have a gpu_sim_cycle or gpu_tot_sim_cycle variables used by DPRINTF
added config option gpgpu_shmem_atomic_warp_parts
- added to option parser (default value: 2)
- updated QV100 config
added trace streams
- ATOMICS
- ATOMICS_DETAIL
resolves mismatch reported in Meeting Minutes -- 3/20/20

Benchmark 2 (Atomic bandwidth to the same address)
atomic_add_bw_conflict:

cycle error = 1554% (Sim/HW cycles = 16.5X) so simulator is 16.5X slower than HW.
HW l2 atomics = 10240, Sim l2 atomics = 163840

In the microbench, we generate atomic accesses wherein all threads access the same memory region.
We generate 163840 threads, each executes 1 time (so total atomic insts = 163840)

It seems in gpgpu-sim, it serializes the accesses to the same region, while the HW coalesce these accesses into 16 threads group.
We can fix the simulator to coalesce conflict accesses into 16 threads, this may alleviate the problem here.
A relatively simple change to "memory_coalescing_arch_atomic" in abstract_hardware_model.cc should fix this.

Sub core & some minor bug fix

- best case coalescing of atomic operations - full CAM based search - integrated with DPRINTF with ATOMICS Flag

- replaced full CAM coalescing with common case coalescing - correlated with QV100 GPU

- added ATOMICS_DETAIL trace flag - made ATOMICS prints concise - disabled tracing and restored default trace flags in QV100 tested-cfgs

cesar-avalos3

Correlation of atomic ubenches does not look significantly better compared to the latest (as of this review) dev branch of GPGPU-sim, atomic_add_bw_diverge still off by a lot. Code makes sense though.
Still waiting for the feedback of the other reviewers, and the original author.

Worse in SASS

mkhairy and others added 5 commits August 23, 2021 13:58

Merge pull request accel-sim#18 from JRPan/mydev

99b5997

Sub core & some minor bug fix

Added full coalescing to L1 atomics

184b9a7

- best case coalescing of atomic operations - full CAM based search - integrated with DPRINTF with ATOMICS Flag

Modified Atomics coalescing to match Volta V100

f508823

- replaced full CAM coalescing with common case coalescing - correlated with QV100 GPU

Merge branch 'dev' of https://github.com/accel-sim/gpgpu-sim_distribu…

2ffb314

…tion into dev

Cleaned up debug messages, disbled tracing in config

e42de85

- added ATOMICS_DETAIL trace flag - made ATOMICS prints concise - disabled tracing and restored default trace flags in QV100 tested-cfgs

abhaumick requested review from mkhairy and tgrogers April 4, 2022 17:09

JRPan requested review from cesar-avalos3 and removed request for mkhairy May 15, 2023 17:54

cesar-avalos3 added 2 commits May 15, 2023 13:59

Merge branch 'dev' into dev

d287ce9

Merge branch 'dev' into dev

70f12d8

JRPan requested a review from mkhairy May 23, 2023 16:30

cesar-avalos3 previously approved these changes May 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for atomic coalescing at L1, correlated with QV100 hardware #33

Fixes for atomic coalescing at L1, correlated with QV100 hardware #33

abhaumick commented Mar 26, 2022 •

edited

Loading

cesar-avalos3 left a comment •

edited

Loading

Fixes for atomic coalescing at L1, correlated with QV100 hardware #33

Are you sure you want to change the base?

Fixes for atomic coalescing at L1, correlated with QV100 hardware #33

Conversation

abhaumick commented Mar 26, 2022 • edited Loading

cesar-avalos3 left a comment • edited Loading

Choose a reason for hiding this comment

abhaumick commented Mar 26, 2022 •

edited

Loading

cesar-avalos3 left a comment •

edited

Loading