Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU cache utility #1090

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

LRU cache utility #1090

wants to merge 1 commit into from

Conversation

chfast
Copy link
Member

@chfast chfast commented Dec 23, 2024

Adds generic single-threaded implementation of
the least recently used (LRU) cache.

This is to be used to cache the code and code analysis.

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 72.96296% with 73 lines in your changes missing coverage. Please review.

Project coverage is 93.98%. Comparing base (81c9dda) to head (2938377).

Files with missing lines Patch % Lines
test/internal_benchmarks/lru_cache_bench.cpp 0.00% 70 Missing ⚠️
lib/evmone/lru_cache.hpp 93.93% 2 Missing ⚠️
test/unittests/lru_cache_test.cpp 99.40% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1090      +/-   ##
==========================================
- Coverage   94.31%   93.98%   -0.33%     
==========================================
  Files         159      162       +3     
  Lines       17221    17491     +270     
==========================================
+ Hits        16242    16439     +197     
- Misses        979     1052      +73     
Flag Coverage Δ
eof_execution_spec_tests 23.25% <0.00%> (-0.37%) ⬇️
ethereum_tests 26.22% <0.00%> (-0.41%) ⬇️
ethereum_tests_silkpre 18.62% <0.00%> (-0.33%) ⬇️
execution_spec_tests 20.63% <0.00%> (-0.33%) ⬇️
unittests 88.83% <73.50%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
test/unittests/lru_cache_test.cpp 99.40% <99.40%> (ø)
lib/evmone/lru_cache.hpp 93.93% <93.93%> (ø)
test/internal_benchmarks/lru_cache_bench.cpp 0.00% <0.00%> (ø)

@chfast chfast force-pushed the lru_cache branch 7 times, most recently from 013084f to 738e0d7 Compare December 23, 2024 14:10
Adds generic single-threaded implementation of
the least recently used (LRU) cache.

This is to be used to cache the code and code analysis.
Copy link
Contributor

@rodiazet rodiazet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sadly can't complain about anything. Let's ask Copilot then . :)


using LRUList = std::list<LRUEntry>;
using LRUIterator = typename LRUList::iterator;
using Map = std::unordered_map<Key, LRUIterator>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible/better to store values in the map entries? Then there'd be just one link LRUEntry => MapNode.
Now it's double-direction: LRUEntry::key => MapNode::key and MapNode::value -> LRUEntry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see a comment below that there's no performance difference. But it seems more convoluted to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible. Then the LRU list has only the pointer/reference/iterator to the map. And the map is Key => (LRUIterator, Value).

I'm kind of on your side here, but because the performance difference was negligible I decided to keep the "classic" layout for now. There is also small nuance: you cannot have std::list<const Key&> so you need to wrap the reference or use a pointer.

This is to be revisited in case we are going to use intrusive list and/or map iterators.

Map map_;

/// Marks an element as the most recently used by moving it to the back of the LRU list.
void move_to_back(LRUIterator it) noexcept { lru_list_.splice(lru_list_.end(), lru_list_, it); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the name kinda leaks internal representation (order of elements in the list). Maybe better something like touch(), use(), access() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I didn't notice it's a private method, then it's fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I thought about it, e.g. naming it bump_usage(). In the end I decided not be because we also use lru_list_ directly in other places. E.g. we should also replace ru_list_.emplace with something like new_use()?

The main motivation was to wrap splice() which takes me usually some time to figure out.

Comment on lines +119 to +133
auto node = map_.extract(lru_it->key);
swap(node.key(), key);
if (auto [it, inserted, node2] = map_.insert(std::move(node)); !inserted)
{
// Failed re-insertion means the element with the new key is already in the cache.
// Rollback the eviction by re-inserting the node with original key back.
swap(key, node2.key());
map_.insert(std::move(node2));

// Returned iterator points to the element matching the key
// which value must be updated.
lru_it = it->second;
}
lru_it->value = std::move(value); // Replace/update the value.
move_to_back(lru_it);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Juggling of keys is quite confusing, I'd maybe add more comments like

Suggested change
auto node = map_.extract(lru_it->key);
swap(node.key(), key);
if (auto [it, inserted, node2] = map_.insert(std::move(node)); !inserted)
{
// Failed re-insertion means the element with the new key is already in the cache.
// Rollback the eviction by re-inserting the node with original key back.
swap(key, node2.key());
map_.insert(std::move(node2));
// Returned iterator points to the element matching the key
// which value must be updated.
lru_it = it->second;
}
lru_it->value = std::move(value); // Replace/update the value.
move_to_back(lru_it);
auto node = map_.extract(lru_it->key); // node.key() is LRU key
swap(node.key(), key); // node.key() is new key, key is LRU key
if (auto [it, inserted, node2] = map_.insert(std::move(node)); !inserted)
{
// node2.key() is new key (already in cache)
// Failed re-insertion means the element with the new key is already in the cache.
// Rollback the eviction by re-inserting the node with original key back.
swap(key, node2.key()); // key is new key (already in cache), node2.key() is LRU key
map_.insert(std::move(node2));
// Returned iterator points to the element matching the key
// which value must be updated.
lru_it = it->second;
}
lru_it->value = std::move(value); // Replace/update the value.
move_to_back(lru_it);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this value update for existing key feature is quite annoying to implement. And in many cases it is never used... I will try to improve the comments.

Copy link
Member Author

@chfast chfast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should mention O(1) get/put complexity.

Map map_;

/// Marks an element as the most recently used by moving it to the back of the LRU list.
void move_to_back(LRUIterator it) noexcept { lru_list_.splice(lru_list_.end(), lru_list_, it); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I thought about it, e.g. naming it bump_usage(). In the end I decided not be because we also use lru_list_ directly in other places. E.g. we should also replace ru_list_.emplace with something like new_use()?

The main motivation was to wrap splice() which takes me usually some time to figure out.

Comment on lines +119 to +133
auto node = map_.extract(lru_it->key);
swap(node.key(), key);
if (auto [it, inserted, node2] = map_.insert(std::move(node)); !inserted)
{
// Failed re-insertion means the element with the new key is already in the cache.
// Rollback the eviction by re-inserting the node with original key back.
swap(key, node2.key());
map_.insert(std::move(node2));

// Returned iterator points to the element matching the key
// which value must be updated.
lru_it = it->second;
}
lru_it->value = std::move(value); // Replace/update the value.
move_to_back(lru_it);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this value update for existing key feature is quite annoying to implement. And in many cases it is never used... I will try to improve the comments.


using LRUList = std::list<LRUEntry>;
using LRUIterator = typename LRUList::iterator;
using Map = std::unordered_map<Key, LRUIterator>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible. Then the LRU list has only the pointer/reference/iterator to the map. And the map is Key => (LRUIterator, Value).

I'm kind of on your side here, but because the performance difference was negligible I decided to keep the "classic" layout for now. There is also small nuance: you cannot have std::list<const Key&> so you need to wrap the reference or use a pointer.

This is to be revisited in case we are going to use intrusive list and/or map iterators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants