drcachesim: optimize cache simulator #1738

zhaoqin · 2015-07-15T19:05:53Z

Currently, the cache simulator is ~500x of native execution, the overhead including profiling overhead, communication overhead, but the cache simulator's overhead dominates the overall slowdowns.

One simple optimization is to parallel the cache simulator by splitting the memory into sub-regions and runs a cache simulator for each sub-region.

zhaoqin · 2015-07-15T19:13:58Z

Xref original issue #1703

peterpengwei · 2015-07-27T18:05:23Z

Does multithreading sound like a good solution to alleviate the issue? My initial thought is to assign each cache an independent pthread. The LLC thread contains a pthread mutex for all the I&D caches to arbiter it. If it sounds good, I will start to implement it to see if it helps.

zhaoqin · 2015-07-29T03:25:10Z

No, you should not parallel the cache simulator by assign each cache as an independent pthread.
There would be significant communication overhead dominate the slowdown.

zhaoqin · 2015-07-29T03:32:01Z

The right way should be split the cache into subregion, and each subregion is simulated by one thread.
Fro example, you can use 4 threads to simulate memory reference address from [4N, 4N+cacheline), [4N+cacheline, 4N+2xcacheline), [4N+2xcacheline, 4N+3xcacheline), and [4N+3xcacheline, 4N+4cacheline). By doing that, there would be no communication among the four threads, and should gain the max parallelization.
The potential downside is if the memory reference might concentrate on one or two cache,.

derekbruening · 2021-03-29T15:54:01Z

With larger cache hierarchies and higher associativity (such as simulating a full 2-socket Skylake system) I'm seeing significant time spent walking the ways looking for tags, particularly in invalidate() (this is with coherence turned on as well). I found that inserting a hashtable (if it's initialized to a large enough starting size) results in a 15% speedup for my setup. I'll post the PR.

Replaces drcachesim's loops over all ways with a hashtable lookup. For larger cache hierarchies and caches with higher associativity this increases performance by 15% in cpu-bound tests on offline traces, when we use a large initial table size to avoid resizes which seem to outweigh the gains. The hashtable unfortunately results in a 15% slowdown on simple cache hierarchies, due to the extra time in erase() and other maintenance operations outweighing the smaller gains in lookup. Thus, we make the default to *not* use a hashtable and use the original linear walk, providing a method to optionally enable the hashtable. The cache simulator enables the hashtables for any 3+-level cache hierarchy with either coherence or many cores. Adds coherence to some existing 3-level-hierarchy tests to ensure we have tests that cover the hashtable path. The TLB simulator will need to tweak these hashtables: but it looks like it is already doing the wrong thing in invalidate() and other simulator_t methods, filed as #4816. Issue: #1738, #4816

zhaoqin added the help wanted label Jul 15, 2015

derekbruening mentioned this issue Sep 19, 2016

improve memory trace performance #2001

Open

derekbruening added the Component-DrMemtrace label Oct 30, 2018

derekbruening mentioned this issue May 16, 2019

online drcachesim is too slow (8x slower than cachegrind): regression? #3614

Open

derekbruening mentioned this issue Mar 29, 2021

TLB simulator fails to add tag,pid pair lookup to parent methods #4816

Open

This was referenced Apr 7, 2021

i#1738 dr$sim perf: Improve tag lookup speed with a hashtable #4843

Merged

Use absl containers for faster hashtable performance #4844

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drcachesim: optimize cache simulator #1738

drcachesim: optimize cache simulator #1738

zhaoqin commented Jul 15, 2015

zhaoqin commented Jul 15, 2015

peterpengwei commented Jul 27, 2015

zhaoqin commented Jul 29, 2015

zhaoqin commented Jul 29, 2015

derekbruening commented Mar 29, 2021 •

edited

Loading

drcachesim: optimize cache simulator #1738

drcachesim: optimize cache simulator #1738

Comments

zhaoqin commented Jul 15, 2015

zhaoqin commented Jul 15, 2015

peterpengwei commented Jul 27, 2015

zhaoqin commented Jul 29, 2015

zhaoqin commented Jul 29, 2015

derekbruening commented Mar 29, 2021 • edited Loading

derekbruening commented Mar 29, 2021 •

edited

Loading