Fix deadlock between Cache.put and invalidateAll #99480

thecoop · 2023-09-12T13:34:52Z

This fixes #99326

Unfortunately testing this is exceptionally difficult - hitting the second lock in put requires the item to be there already, but invalidateAll removes all items from the cache, meaning the put doesn't hit the second lock. The test I've added doesn't trigger the deadlock condition on the old code after 2000 runs

elasticsearchmachine · 2023-09-12T13:35:16Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

elasticsearchmachine · 2023-09-12T13:35:39Z

Hi @thecoop, I've created a changelog YAML for you.

thecoop · 2023-09-12T14:42:22Z

This is probably worth backporting, maybe even to 7.17

rjernst

LGTM. This does seem worth backporting.

ldematte

LGTM. I checked the other methods and indeed invalidateAll is the only one acquiring locks in the wrong order.

thecoop · 2023-09-13T08:28:56Z

The test added here doesn't actually find the original bug. I'm not sure how to even create a test for this case - in which case I might as well just remove it...

ldematte · 2023-09-13T08:49:55Z

I think it is extremely hard to find. I tried using jcstress and cook up a quick test using the 8.9.2 artifacts, and after 680 iterations it does not find a issues. Now, jcstress is not particularly well versed in finding deadlocks, but if it does not find anything the chances of finding it with a normal test are slim.
The only thing I can think about is faking the locks, but that would require changes to Cache that I think are not worth it.

thecoop · 2023-09-14T09:43:57Z

I've removed the test, given the problem was quite a basic one (incorrect lock acquisition order) and the fix is clear

thecoop · 2023-09-14T12:16:55Z

@elasticmachine update branch

thecoop · 2023-09-14T13:41:56Z

@elasticsearchmachine rerun elasticsearch-ci/part-1

rjernst

This still looks fine, but even if the situation is rare, ensuring the order of releasing locks seems a worthwhile thing to test. I don't feel that strongly about it (merge if you like), but I have the following suggestion for how to test:

The CacheSegment is what holds the lock objects which we would want to mock. I think a package private ctor for Cache could take the ctor for CacheSegment. Then move the construction of the read/write lock to an overridable method, remove the final (no reason for it to be final really anyways since it is private). Have a TestCacheSegment which subclasses and creates a delegate, so that you can hook into when locking/unlocking happens, and then assert on the order. Again, this is just an idea, I realize it is a bit of a change, but IMO not too much to ensure something that can result in a deadlock.

The invalidateAll method is taking out the lru lock and segment locks in a different order to the put method, when the put is replacing an existing value. This results in a deadlock between the two methods as they try to swap locks. This fixes it by making sure invalidateAll takes out locks in the same order as put. This is difficult to test because the put needs to be replacing an existing value, and invalidateAll clears the cache, resulting in subsequent puts not hitting the deadlock condition. A test that overrides some internal implementations to expose this particular deadlock will be coming later.

elasticsearchmachine · 2023-09-14T14:38:37Z

💚 Backport successful

Status	Branch	Result
✅	8.10
✅	7.17

The invalidateAll method is taking out the lru lock and segment locks in a different order to the put method, when the put is replacing an existing value. This results in a deadlock between the two methods as they try to swap locks. This fixes it by making sure invalidateAll takes out locks in the same order as put. This is difficult to test because the put needs to be replacing an existing value, and invalidateAll clears the cache, resulting in subsequent puts not hitting the deadlock condition. A test that overrides some internal implementations to expose this particular deadlock will be coming later.

Fix deadlock between put and invalidateAll

43b4739

thecoop added >bug :Core/Infra/Core Core issues without another label labels Sep 12, 2023

elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v8.11.0 labels Sep 12, 2023

Update docs/changelog/99480.yaml

694fd40

thecoop added 2 commits September 12, 2023 14:38

Update tests

2a36c9c

splotless, changelog

0bb4197

rjernst approved these changes Sep 12, 2023

View reviewed changes

ldematte approved these changes Sep 13, 2023

View reviewed changes

thecoop added v8.10.0 v7.17.14 v8.10.1 auto-backport-and-merge and removed v8.10.0 labels Sep 13, 2023

Remove useless test

a0c3843

thecoop requested a review from rjernst September 14, 2023 09:55

Merge branch 'main' into fix-invalidate-put-deadlock

c533d7c

rjernst approved these changes Sep 14, 2023

View reviewed changes

thecoop merged commit 7ad521f into elastic:main Sep 14, 2023

thecoop mentioned this pull request Sep 14, 2023

[8.10] Fix deadlock between Cache.put and invalidateAll (#99480) #99579

Merged

thecoop mentioned this pull request Sep 14, 2023

[7.17] Fix deadlock between Cache.put and invalidateAll (#99480) #99580

Merged

thecoop deleted the fix-invalidate-put-deadlock branch September 14, 2023 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock between Cache.put and invalidateAll #99480

Fix deadlock between Cache.put and invalidateAll #99480

thecoop commented Sep 12, 2023 •

edited

Loading

elasticsearchmachine commented Sep 12, 2023

elasticsearchmachine commented Sep 12, 2023

thecoop commented Sep 12, 2023

rjernst left a comment

ldematte left a comment

thecoop commented Sep 13, 2023

ldematte commented Sep 13, 2023 •

edited

Loading

thecoop commented Sep 14, 2023

thecoop commented Sep 14, 2023

thecoop commented Sep 14, 2023

rjernst left a comment

elasticsearchmachine commented Sep 14, 2023

Fix deadlock between Cache.put and invalidateAll #99480

Fix deadlock between Cache.put and invalidateAll #99480

Conversation

thecoop commented Sep 12, 2023 • edited Loading

elasticsearchmachine commented Sep 12, 2023

elasticsearchmachine commented Sep 12, 2023

thecoop commented Sep 12, 2023

rjernst left a comment

Choose a reason for hiding this comment

ldematte left a comment

Choose a reason for hiding this comment

thecoop commented Sep 13, 2023

ldematte commented Sep 13, 2023 • edited Loading

thecoop commented Sep 14, 2023

thecoop commented Sep 14, 2023

thecoop commented Sep 14, 2023

rjernst left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 14, 2023

💚 Backport successful

thecoop commented Sep 12, 2023 •

edited

Loading

ldematte commented Sep 13, 2023 •

edited

Loading