-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make resource autonames determinstic #8631
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8631 +/- ##
==========================================
+ Coverage 59.35% 59.38% +0.03%
==========================================
Files 637 637
Lines 97765 97844 +79
Branches 1385 1386 +1
==========================================
+ Hits 58025 58102 +77
+ Misses 36466 36464 -2
- Partials 3274 3278 +4
Continue to review full report at Codecov.
|
@Frassle there was some discussion around it, is this ready for review or are some more changes coming? |
I need to plumb sequence number through for the dynamic providers still, but I think otherwise this is ready to be looked at. Things to keep in mind if reviewing this:
|
@@ -585,8 +596,13 @@ func (sg *stepGenerator) generateStepsFromDiff( | |||
// | |||
// Note that if we're performing a targeted replace, we already have the correct inputs. | |||
if prov != nil && !sg.isTargetedReplace(urn) { | |||
// Increment the sequence number (if it's known) before calling check so we get a new autoname | |||
if new.SequenceNumber != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main place where we're incrementing it seems. Right?
Why is it under prov != nil && !sg.isTargetedReplace(urn)
condition? Looks like otherwise we don't even call Check. In other words, if we are doing targeted replace, will sequence auto-increment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the code higher in the file (~350) should handle this case now. If we're recreating or doing a targeted replace we need to increment.
var failures []plugin.CheckFailure | ||
inputs, failures, err = prov.Check(urn, nil, goal.Properties, allowUnknowns) | ||
inputs, failures, err = prov.Check(urn, nil, goal.Properties, allowUnknowns, new.SequenceNumber) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the code below. Planner may be cascading along the dependency graph to delete and re-create some other resources. I am wondering if sequencenumber for these resources would auto-increment, and if so, how. It sounds like it should? Since it's an indirect replacement.
NewDeleteReplacementStep(sg.deployment, old, true),
NewReplaceStep(sg.deployment, old, new, diff.ReplaceKeys, diff.ChangedKeys, diff.DetailedDiff, false),
NewCreateReplacementStep(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should increment and I think that will get caught by the step generator seeing that 'recreating' is true for those resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would seem so from:
// Mark the condemned resource as deleted. We won't know until later in the deployment whether
// or not we're going to be replacing this resource.
sg.deletes[dependentResource.URN] = true
and this definition of recreating
// We may be re-creating this resource if it got deleted earlier in the execution of this deployment.
_, recreating := sg.deletes[urn]
However I do not fully grok the state machine here wrt sg.deletes and ordering.
It would seem that we'd need some sort of circularity here that might be absent, that is:
- we handle seq-num in generateSteps
- generateSteps decides to do some extra replacements based on deleteBeforeReplace and dep graph
- in that case those replacements need seq-num handling so these dep resources need to undergo generateSteps
But it would seem that generateSteps does not recur, in fact it would seem it is called once per resource in the order as RegisterResource calls arrive.
Let me run a quick experiment to clarify further.
It's looking decent, I'm fumbling a bit verifying where 0 is correct and where 1 is correct. One thing that strikes me as really counter-intuitive is that Check is where auto-naming happens. We'd want to document that thoroughly at the definition place for Check and in the dev docs. Check really seems to return inputs in CheckResult and that's kind of confusing. Another exercise that helped me - we can link PRs here, is trying to build pulumi-azure-native and pulumi-aws against this PR. For this to work resource.NewUniqueHex needs to be replaced with resource.NewUniqueHexV2 and sequencenumber passed through. I could sort of see how it happens in Check in pulumi-azure-native, but it's not obvious. It seems to subject a lot more props to auto-naming than "name". Couldn't see it at all in the TF-bridge based pulumi-aws. Having these PRs linked in here can help review/verify the call sequence assumptions. Another random thought. For the case of imports with ResourceOption import, impl probably should stick to 0 sequencenumber until the user removes the import option and the resource is managed by pulumi, at which point an update will assign 1. |
Another thing I would like to understand better is precise location of where replacement is decided, that would be the right place to increment the sequence number. Which component is responsible for deciding if a replacement is to happen? It sounds like it should be the engine, and it does so in cases of cascading replaces code noted above. But providers Diff can return a result that indicates that a certain property change forces a replacement? Thinking of Thinking out loud of what we could have.. The replacements only ever are finished off with a provider.Create right; a provider.Update is not a replacement. There's some good verbiage in https://pulumi-developer-docs.readthedocs.io/en/latest/providers/implementers-guide.html?highlight=Check#check about how replacement works. TODO for myself, check the flow in the code tomorrow, expecting to find that every Create initiated in order to replace a resource is receiving results of Check (re-checked inputs) with incremented sequence-number; see if we can use a phantom type just for this tracking to make it super-obvious. |
0 is the default value of falling back to the old non-deterministic logic. So new stuff managed totally by up to date engines will get set to 1 and then increment from that. Old CLIs or providers or existing resources will be 0. We should work out a path to let people easily opt existing resources into the new scheme (because even a replace won't do it automatically)
Yeh I did wonder if autonames should be in engine but then other stuff might also need randomness and the engine doesn't really understand "names" just "ids" which are different.
I'll have a look at doing that.
Currently imports consider this a brand new resources which will be safe to deterministically autoname in the event it gets replaced later and so set the sequence number to 1. This is the same for reads and the import command. I think this is safe because imports will call check and diff against the result of read so that will check the names match. Got to admit I feel like the import/external/read resources stuff is pretty confusing and I think I need to double check a lot of this.
Replacements are decided by the step generator, one of the bits of info towards that decision is the provider diff saying it should get replaced. But it looks like changing an external resource to an internal managed one also triggers a replace.
Yes replaces are done by provider Create and so should always call Check first. |
While I think that moving towards to a deterministic
I believe that (2) is probably going to be a problem no matter what we do. I think that all we can do there is think of possible escape hatches that can be applied by the user and think of possibilities for handling such properties in providers (e.g. by considering properties unknown1). I think we might be able to address (1) within the context of update constraints by taking a different approach, however. We already need to control for the non-determinism of autonames in the sense that we don't want autonames to be regenerated each time a resource is Footnotes
|
(1) yes, (2) not sure why non-determinism is in play? Naively sequence counter increments on every replace per resource, non-determinism doesn't cause reorderings of replaces right, a resource can be replaced only once at a time (as if locking on resource)? Perhaps if you could elaborate that. |
So instead of Here is a transcript of Check calls from a program forcing a bucket replacement. This helped me a lot to follow the discussion.
What seems a bit tough is that engine does not know which props like "bucket" are subject to auto-naming (provider does), and feeding olds instead of |
Just to point out any idea that requires any extra data from the provider would also require updates. So if we want to avoid this we have to solve this entirely via engine changes.
I'd hope that Check in all cases CAN be made deterministic, the simple fix for things that aren't deterministic is to make them
I've already tried this and as Anton points out you hit problems with properties changing from nil to non-nil. Taking his bucket example and imagining you run it with a plan then during preview you wouldn't pass
We could do this but:
|
Yep, that's right. Which is maybe impossible.
I thought we weren't validating the post-Check inputs? Just to be clear about the flow I'm thinking of here:
If I understand the state of the world correctly, we'll validate the resource in 2a. against its pre- |
(for reference, here's an extremely hacky branch that saves post-check inputs in the situations I described: https://github.com/pulumi/pulumi/compare/pgavlin/savedInputs. It seems to work as I would expect, though I'm likely missing something as I've only tried some basic scenarios: https://asciinema.org/a/P9owoTmZITmhbuAffgryn63KO) |
No we are validating post-Check inputs because that's what's stored in the state file that we can diff againt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through this again, looking good (safe).
For my information the new common/apitype/core.go ResourceV3 - where does it interact with the state backend (such as Pulumi Service stack state backend)? Couldn't find it at a glance.
|
NewUniqueHexV2 is unused, let’s get rid of it. This was introduced in #8631, used in some providers, but then reverted https://github.com/search?q=org%3Apulumi+NewUniqueHexV2&type=pullrequests
NewUniqueHexV2 is unused, let’s get rid of it. This was introduced in #8631, used in some providers, but then reverted https://github.com/search?q=org%3Apulumi+NewUniqueHexV2&type=pullrequests
Description
This adds a new property to resource state, a sequence number that starts at 1 when created and incremented on each replace operation. This sequence number is passed to the provider Check function and can be used plus the resource URN to build a hash for the random suffix of an autoname (see
NewUniqueHexV2
).For existing resources the sequence number will be zero (default int value) and
NewUniqueHexV2
will fall back to the old logic of a non-deterministic suffix.This will also be fallen back to if an old CLI is used which won't read or maintain the sequence numbers from the state file, implicitly resetting all resources sequence numbers to zero (that is unknown).
An old CLI will also not pass sequenceNumber to check via GRPC and so even if using a new provider that supports sequence numbers GRPC will marshal the number as zero and so
NewUniqueHexV2
will fall back to non-determinism.The reason for treating zero as unknown rather than the first valid sequence number is for the following scenario:
Given the above if sequence number is zero we just keep the existing non-deterministic behavior to avoid name clashes. However once we've done a replace with non-deterministic behavior we can safely reset back to using sequence numbers (we've just created a new resource with a random name so we know that deterministic name 1 won't clash with it). To mark this state we store a -1 for sequence number in the state file. -1 is never passed to providers as part of Check, it's either translated to 0 (in the case we're calling Check on the existing resource) or 1 when we go to create a replacement resource.
Checklist