-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750
VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750
Conversation
CI Results: |
Build Results: |
vault/identity_store.go
Outdated
} | ||
|
||
// If the entity exists in MemDB it must differ from the entity in | ||
// the storage bucket because of above test. Go through all of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the entity exists and it differs, are we assuming that the difference between them is always going to be the aliases? Why don't we simply delete the entity from memdb and add it again like the previous implementation did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the MemDB entity differs from its corresponding storage bucket entity, it may or may not be the aliases. But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing. I could add a test case where there are no changes to the Aliases to really make sure it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing.
Wouldn't be easy to just detect if there are changes and replace one entity with the other? Deleting from memdb and adding without having to walk a slice and detect changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what Bianca is saying. Just chiming in so I can follow along 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a change that follows this suggestion.
// function does not delete those aliases, it only creates missing | ||
// ones. | ||
if memDBEntity != nil { | ||
if err := i.deleteAliasesInEntityInTxn(txn, memDBEntity, memDBEntity.Aliases); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcboudreau @elliesterner brought up a valid point while we were talking about entity merge prevention. Do you think not deleting the entity from memdb might cause an automatic merge to be triggered? if you could write a test for that, that would be awesome. we would like to prevent further merges from happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the code in (*IdentityStore).upsertEntityInTxn
, there are 2 circumstances that lead to (*IdentityStore).mergeEntityAsPartOfUpsert
being called:
- previousEntity is not nil and entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to the value of
previousEntity.ID
- entity has an alias in its Aliases field that exists in MemDB and whose CanonicalID field is set to a value that is different than
entity.ID
.
In the (*IdentityStore).invalidateEntityBucket
function, when upsertEntityInTxn is called, the previousEntity argument is always nil
, so that rules out circumstance 1.
And by pre-deleting the aliases, we ensure that circumstance 2 cannot happen either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of clarity, pre-deleting the entity from MemDB and then calling (*IdentityStore).upsertEntityInTxn
won't prevent an entity merge from happening, since the logic that decides that doesn't take into account whether the entity exists in MemDB or not. I think the only way to prevent an entity merge from happening, would be to scan each of the aliases associated with the entity (instead of pre-deleted them) and search for any alias in MemDB with a matching alias name and mount accessor and delete those. That would make it impossible for circumstance 2 to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!!
// If this is a performance secondary, the entity created on | ||
// this node would have been cached in a local cache based on | ||
// the result of the CreateEntity RPC call to the primary | ||
// cluster. Since this invalidation is signaling that the | ||
// entity is now in the primary cluster's storage, the locally | ||
// cached entry can be removed. | ||
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active { | ||
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil { | ||
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole dance is fascinating. I'm a little curious how you discovered it here - it seems like this being missing is an unrelated bug to the regression right?
@biazmoreira you probably know all about this already, is this another place that breaks the mental model of all global writes go to primary because we updated memdb with a global thing outside of replication? I think we saw places like that with standbys but this was new to me that we have perf secondaries updating their memdb outside of replication for replicated state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone who reads this later, I realised this wasn't new code - just moved down from further up (see lines 739 and on in the before part of this diff).
Description
This change corrects a regression that was introduced by #27184.
When an entity has been modified in a storage bucket such that one or more aliases has been removed, those removed aliases were not being deleted from the MemDB table containing them. This change corrects this by scanning all associated aliases with entities that have been determined to be modified in the storage bucket, and deleting any associated aliases from MemDB that are no longer associated with the entity in the storage bucket.
TODO only if you're a HashiCorp employee
getting backported to N-2, use the new style
backport/ent/x.x.x+ent
labelsinstead of the old style
backport/x.x.x
labels.the normal
backport/x.x.x
label (there should be only 1).of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
in the PR description, commit message, or branch name.
description. Also, make sure the changelog is in this PR, not in your ENT PR.