Skip to content

Commit

Permalink
i#1738 dr$sim perf: Improve tag lookup speed with a hashtable
Browse files Browse the repository at this point in the history
Replaces drcachesim's loops over all ways with a hashtable lookup.
For larger cache hierarchies and caches with higher associativity this
increases performance by 15% in cpu-bound tests on offline traces,
when we use a large initial table size to avoid resizes which seem to
outweigh the gains.

The hashtable unfortunately results in a 15% slowdown on simple cache
hierarchies, due to the extra time in erase() and other maintenance
operations outweighing the smaller gains in lookup.  Thus, we make the
default to *not* use a hashtable and use the original linear walk,
providing a method to optionally enable the hashtable.  The cache
simulator enables the hashtables for any 3+-level cache hierarchy with
either coherence or many cores.

Adds coherence to some existing 3-level-hierarchy tests to ensure we
have tests that cover the hashtable path.

The TLB simulator will need to tweak these hashtables: but it looks
like it is already doing the wrong thing in invalidate() and other
simulator_t methods, filed as #4816.

Issue: #1738, #4816
  • Loading branch information
derekbruening committed Apr 7, 2021
1 parent 61a0ed3 commit fbe28bc
Show file tree
Hide file tree
Showing 9 changed files with 135 additions and 63 deletions.
14 changes: 5 additions & 9 deletions clients/drcachesim/simulator/cache.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -70,14 +70,10 @@ cache_t::flush(const memref_t &memref)
compute_tag(memref.flush.addr + memref.flush.size - 1 /*no overflow*/);
last_tag_ = TAG_INVALID;
for (; tag <= final_tag; ++tag) {
int block_idx = compute_block_idx(tag);
for (int way = 0; way < associativity_; ++way) {
if (get_caching_device_block(block_idx, way).tag_ == tag) {
get_caching_device_block(block_idx, way).tag_ = TAG_INVALID;
// Xref cache_block_t constructor about why we set counter to 0.
get_caching_device_block(block_idx, way).counter_ = 0;
}
}
auto block_way = find_caching_device_block(tag);
if (block_way.first == nullptr)
continue;
invalidate_caching_device_block(block_way.first);
}
// We flush parent_'s code cache here.
// XXX: should L1 data cache be flushed when L1 instr cache is flushed?
Expand Down
8 changes: 7 additions & 1 deletion clients/drcachesim/simulator/cache_simulator.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -363,6 +363,12 @@ cache_simulator_t::cache_simulator_t(std::istream *config_file)
success_ = false;
return;
}
// TODO add comment and name "32"
if (other_caches_.size() > 0 && (knobs_.model_coherence || knobs_.num_cores > 32)) {
for (auto &cache : all_caches_) {
cache.second->set_hashtable_use(true);
}
}
}

cache_simulator_t::~cache_simulator_t()
Expand Down
95 changes: 49 additions & 46 deletions clients/drcachesim/simulator/caching_device.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -42,8 +42,11 @@ caching_device_t::caching_device_t()
: blocks_(NULL)
, stats_(NULL)
, prefetcher_(NULL)
// The tag being hashed is already right-shifted to the cache line and
// an identity hash is plenty good enough and nice and fast.
// We set the size and load factor only if being used, in set_hashtable_use().
, tag2block(0, [](addr_t key) { return static_cast<unsigned long>(key); })
{
/* Empty. */
}

caching_device_t::~caching_device_t()
Expand Down Expand Up @@ -99,6 +102,25 @@ caching_device_t::init(int associativity, int block_size, int num_blocks,
return true;
}

std::pair<caching_device_block_t *, int>
caching_device_t::find_caching_device_block(addr_t tag)
{
if (use_tag2block_table_) {
auto it = tag2block.find(tag);
if (it == tag2block.end())
return std::make_pair(nullptr, 0);
assert(it->second.first->tag_ == tag);
return it->second;
}
int block_idx = compute_block_idx(tag);
for (int way = 0; way < associativity_; ++way) {
caching_device_block_t &block = get_caching_device_block(block_idx, way);
if (block.tag_ == tag)
return std::make_pair(&block, way);
}
return std::make_pair(nullptr, 0);
}

void
caching_device_t::request(const memref_t &memref_in)
{
Expand Down Expand Up @@ -127,25 +149,18 @@ caching_device_t::request(const memref_t &memref_in)

memref = memref_in;
for (; tag <= final_tag; ++tag) {
int way;
int way = associativity_;
int block_idx = compute_block_idx(tag);
bool missed = false;

if (tag + 1 <= final_tag)
memref.data.size = ((tag + 1) << block_size_bits_) - memref.data.addr;

for (way = 0; way < associativity_; ++way) {
caching_device_block_t *cache_block =
&get_caching_device_block(block_idx, way);
if (cache_block->tag_ == tag) {
break;
}
}

if (way != associativity_) {
auto block_way = find_caching_device_block(tag);
if (block_way.first != nullptr) {
// Access is a hit.
caching_device_block_t *cache_block =
&get_caching_device_block(block_idx, way);
caching_device_block_t *cache_block = block_way.first;
way = block_way.second;
stats_->access(memref, true /*hit*/, cache_block);
if (parent_ != NULL)
parent_->stats_->child_access(memref, true, cache_block);
Expand Down Expand Up @@ -215,7 +230,7 @@ caching_device_t::request(const memref_t &memref_in)
}
}
}
cache_block->tag_ = tag;
update_tag(cache_block, way, tag);
}

access_update(block_idx, way);
Expand Down Expand Up @@ -271,26 +286,20 @@ caching_device_t::replace_which_way(int block_idx)
void
caching_device_t::invalidate(addr_t tag, invalidation_type_t invalidation_type)
{
int block_idx = compute_block_idx(tag);

for (int way = 0; way < associativity_; ++way) {
auto &cache_block = get_caching_device_block(block_idx, way);
if (cache_block.tag_ == tag) {
cache_block.tag_ = TAG_INVALID;
cache_block.counter_ = 0;
stats_->invalidate(invalidation_type);
// Invalidate last_tag_ if it was this tag.
if (last_tag_ == tag) {
last_tag_ = TAG_INVALID;
}
// Invalidate the block in the children's caches.
if (invalidation_type == INVALIDATION_INCLUSIVE && inclusive_ &&
!children_.empty()) {
for (auto &child : children_) {
child->invalidate(tag, invalidation_type);
}
auto block_way = find_caching_device_block(tag);
if (block_way.first != nullptr) {
invalidate_caching_device_block(block_way.first);
stats_->invalidate(invalidation_type);
// Invalidate last_tag_ if it was this tag.
if (last_tag_ == tag) {
last_tag_ = TAG_INVALID;
}
// Invalidate the block in the children's caches.
if (invalidation_type == INVALIDATION_INCLUSIVE && inclusive_ &&
!children_.empty()) {
for (auto &child : children_) {
child->invalidate(tag, invalidation_type);
}
break;
}
}
// If this is a coherence invalidation, we must invalidate children caches.
Expand All @@ -305,12 +314,9 @@ caching_device_t::invalidate(addr_t tag, invalidation_type_t invalidation_type)
bool
caching_device_t::contains_tag(addr_t tag)
{
int block_idx = compute_block_idx(tag);
for (int way = 0; way < associativity_; way++) {
if (get_caching_device_block(block_idx, way).tag_ == tag) {
return true;
}
}
auto block_way = find_caching_device_block(tag);
if (block_way.first != nullptr)
return true;
if (children_.empty()) {
return false;
}
Expand All @@ -328,12 +334,9 @@ void
caching_device_t::propagate_eviction(addr_t tag, const caching_device_t *requester)
{
// Check our own cache for this line.
int block_idx = compute_block_idx(tag);
for (int way = 0; way < associativity_; way++) {
if (get_caching_device_block(block_idx, way).tag_ == tag) {
return;
}
}
auto block_way = find_caching_device_block(tag);
if (block_way.first != nullptr)
return;

// Check if other children contain this line.
if (children_.size() != 1) {
Expand Down
53 changes: 52 additions & 1 deletion clients/drcachesim/simulator/caching_device.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -36,6 +36,9 @@
#ifndef _CACHING_DEVICE_H_
#define _CACHING_DEVICE_H_ 1

#include <iostream> //NOCHECK
#include <functional>
#include <unordered_map>
#include <vector>

#include "caching_device_block.h"
Expand Down Expand Up @@ -99,6 +102,17 @@ class caching_device_t {
{
return double(loaded_blocks_) / num_blocks_;
}
virtual inline void
set_hashtable_use(bool use_hashtable)
{
use_tag2block_table_ = use_hashtable;
// Resizing from an initial small table causes noticeable overhead, so we
// start with a relatively large table.
tag2block.reserve(1 << 16);
// Even with the large initial size, for large caches we want to keep the
// load factor small.
tag2block.max_load_factor(0.5);
}

protected:
virtual void
Expand All @@ -121,6 +135,33 @@ class caching_device_t {
{
return *(blocks_[block_idx + way]);
}

inline void
invalidate_caching_device_block(caching_device_block_t *block)
{
if (use_tag2block_table_)
tag2block.erase(block->tag_);
block->tag_ = TAG_INVALID;
// Xref cache_block_t constructor about why we set counter to 0.
block->counter_ = 0;
}

inline void
update_tag(caching_device_block_t *block, int way, addr_t new_tag)
{
if (use_tag2block_table_) {
if (block->tag_ != TAG_INVALID)
tag2block.erase(block->tag_);
tag2block[new_tag] = std::make_pair(block, way);
}
block->tag_ = new_tag;
}

// Returns the block (and its way) whose tag equals `tag`.
// Returns <nullptr,0> is there is no such block.
std::pair<caching_device_block_t *, int>
find_caching_device_block(addr_t tag);

// a pure virtual function for subclasses to initialize their own block array
virtual void
init_blocks() = 0;
Expand Down Expand Up @@ -161,6 +202,16 @@ class caching_device_t {
addr_t last_tag_;
int last_way_;
int last_block_idx_;
// Optimization: keep a hashtable for quick lookup of {block,way}
// given a tag, if using a large cache hierarchy where serial
// walks over the associativity end up as bottlenecks.
// We can't easily remove the blocks_ array and replace with just
// the hashtable as replace_which_way(), etc. want quick access to
// every way for a given line index.
std::unordered_map<addr_t, std::pair<caching_device_block_t *, int>,
std::function<unsigned long(addr_t)>>
tag2block;
bool use_tag2block_table_ = false;
};

#endif /* _CACHING_DEVICE_H_ */
5 changes: 4 additions & 1 deletion clients/drcachesim/simulator/tlb.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -49,6 +49,9 @@ tlb_t::request(const memref_t &memref_in)
// Since pid is needed in a lot of places from the beginning to the end,
// it might also not be a good way to write a lot of helper functions
// to isolate them.
// TODO i#4816: This tag,pid pair lookup needs to be imposed on the parent
// methods invalidate(), contains_tag(), and propagate_eviction() by overriding
// them.

// Unfortunately we need to make a copy for our loop so we can pass
// the right data struct to the parent and stats collectors.
Expand Down
6 changes: 5 additions & 1 deletion clients/drcachesim/simulator/tlb.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2015-2020 Google, Inc. All rights reserved.
* Copyright (c) 2015-2021 Google, Inc. All rights reserved.
* **********************************************************/

/*
Expand Down Expand Up @@ -45,6 +45,10 @@ class tlb_t : public caching_device_t {
void
request(const memref_t &memref) override;

// TODO i#4816: This tag,pid pair lookup needs to be imposed on the parent
// methods invalidate(), contains_tag(), and propagate_eviction() by overriding
// them.

protected:
void
init_blocks() override;
Expand Down
3 changes: 3 additions & 0 deletions clients/drcachesim/tests/cores-1-levels-3-no-missfile.conf
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
// Common params.
num_cores 1
line_size 64
// Turn on coherence as another test of that option, as well as
// to trigger hashtable optimizations in caches and test those.
coherence true

L1I { // L1 I$
type instruction
Expand Down
7 changes: 5 additions & 2 deletions clients/drcachesim/tests/simple-config-file.templatex
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,14 @@ Core #0 \(1 thread\(s\)\)
L1I stats:
Hits: *[0-9,\.]*
Misses: *[0-9,\.]*
Invalidations: *[0-9,\.]*
Parent invalidations: *[0-9,\.]*
Write invalidations: *[0-9,\.]*
.* Miss rate: [0-9][,\.]..%
L1D stats:
Hits: *[0-9,\.]*
Misses: *[0-9,\.]*
Invalidations: *[0-9,\.]*
Parent invalidations: *[0-9,\.]*
Write invalidations: *[0-9,\.]*
.* Miss rate: [0-9][,\.]..%
L2 stats:
Hits: *[0-9,\.]*
Expand All @@ -26,3 +28,4 @@ LLC stats:
.* Local miss rate: *[0-9,\.]*%
Child hits: *[0-9,\.]*
Total miss rate: *[0-9,\.]*%
Coherence stats:.*
7 changes: 5 additions & 2 deletions clients/drcachesim/tests/threads-with-config-file.templatex
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,14 @@ Core #0 \(.*\)
L1I stats:
Hits: *[0-9,\.]*
Misses: *[0-9,\.]*
Invalidations: *[0-9,\.]*
Parent invalidations: *[0-9,\.]*
Write invalidations: *[0-9,\.]*
.* Miss rate: *[0-9,\.]*%
L1D stats:
Hits: *[0-9,\.]*
Misses: *[0-9,\.]*
Invalidations: *[0-9,\.]*
Parent invalidations: *[0-9,\.]*
Write invalidations: *[0-9,\.]*
.* Miss rate: *[0-9,\.]*%
L2 stats:
Hits: *[0-9,\.]*
Expand All @@ -55,3 +57,4 @@ LLC stats:
.* Local miss rate: *[0-9,.]*%
Child hits: *[0-9,\.]*
Total miss rate: *[0-9,\.]*%
Coherence stats:.*

0 comments on commit fbe28bc

Please sign in to comment.