-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code for DAG status to emit Conditions #2962
Add code for DAG status to emit Conditions #2962
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2962 +/- ##
==========================================
- Coverage 76.45% 75.05% -1.41%
==========================================
Files 74 90 +16
Lines 5806 5869 +63
==========================================
- Hits 4439 4405 -34
- Misses 1272 1370 +98
+ Partials 95 94 -1
|
d1b0386
to
98a2323
Compare
internal/contour/metrics_test.go
Outdated
Root: map[metrics.Meta]int{ | ||
{Namespace: "finance"}: 1, | ||
}, | ||
Root: map[metrics.Meta]int{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here, is the test wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, but I hadn't looked deeply enough into why; I needed to change the order of some things in httpproxy_processor
for the information to flow out correctly.
func (d *DAG) Statuses() map[types.NamespacedName]Status { | ||
return d.statuses | ||
// GetTestStatuses returns a slice of Status objects associated with | ||
// the computation of this DAG, for testing status output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the function defined to only emit HTTPProxy conditions? Can we add a TODO and an issue to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it. We won't be able to remove it completely without completely refactoring the status tests, as they need the name of the object and the detailed object. That will need a lot of extra testing machinery to sit between the status.Cache
and the status tests.
#2967 opened to track.
// When we're committing, if we already have a Valid Condition with an error, and we're trying to | ||
// set the object back to Valid, skip the commit, as we've visited too far down. | ||
// If this is removed, the status reporting for when a parent delegates to a child that delegates to itself | ||
// will not work. Yes, I know, problems everywhere. I'm sorry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref an issue to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create an issue, but I don't see how it's fixable, as it's an artifact of:
- Conditions are designed to allow multiple errors to be added, so we removed the behavior that only the first
commit()
makes it into the cache. - If you allow multiple commits, in this case, because the child is recursed into twice during
computeRoutes
, the error is added correctly once, and then it's overwritten by aValid
condition. - So, we're going to have to do something to check that we don't flip a condition back to
Valid
:true
if there's already an error on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2968 is the related issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds to me like this is a problem with the API design but we can address in #2968.
I'll create an issue, but I don't see how it's fixable, as it's an artifact of:
- Conditions are designed to allow multiple errors to be added, so we removed the behavior that only the first
commit()
makes it into the cache.
Need to consider what commit really means. If we committed a change then we want to make a change on top of that commit, the API should support that rather than hack it up at the end.
- If you allow multiple commits, in this case, because the child is recursed into twice during
computeRoutes
, the error is added correctly once, and then it's overwritten by aValid
condition.
I would have thought that a child proxy was either valid or not; I'm surprised that it depends on the context. If it is dependent on context, then I expect this could also happen across proxies.
- So, we're going to have to do something to check that we don't flip a condition back to
Valid
:true
if there's already an error on it.
Yup, sounds like it. I think that the problem here is that ProxyAccessor
isn't cache-aware. Each time the cache is accessed, the latest accessor should get the most recent condition state. We need to be careful about reentrancy too (either disallow that or have some merge strategy when it happens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handful of things so far, still looking..
internal/contour/metrics_test.go
Outdated
Root: map[metrics.Meta]int{ | ||
{Namespace: "finance"}: 1, | ||
}, | ||
Root: map[metrics.Meta]int{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing why the expected results in this file should change, i.e. why the proxy's not being counted as a root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because the object itself wasn't available; I had to change the order of things at the top of computeRoutes
in httpproxy_processor.go
to enable the vhost info to propagate out correctly.
The remaining changes in metrics_test.go
are because we actually emit more info now about the vhost. In addition, there was one time when we were actually counting things incorrectly.
@@ -114,18 +118,21 @@ func TestDAGStatus(t *testing.T) { | |||
}, | |||
} | |||
|
|||
run("valid proxy", testcase{ | |||
run(t, "valid proxy", testcase{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't new to this PR but feels weird to have this test nested with the other proxy setup types and not with the other test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean? It's below the proxy where it's used, like the rest.
suc.objectCache[suc.objKey(o.Name, o.Namespace, contour_api_v1.HTTPProxyGVR)] = obj | ||
// fmt.Printf("Saved %s to cache, cache has %d entries\n", suc.objKey(o.Name, o.Namespace, contour_api_v1.HTTPProxyGVR), len(suc.objectCache)) | ||
default: | ||
panic(fmt.Sprintf("status caching not supported for object type %T", obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like the panic when an invalid type is encountered, not sure how others feel about but, but to me, doesn't seem like case where Contour should be terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it needs to stay, should update the message to understand a bit more about where this was encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an assertion that programmers obeyed the invariants, so panic is OK by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the panic will only happen if a developer has passed the wrong type of thing, so this should never happen. This is belt-and-suspenders territory.
internal/dag/httpproxy_processor.go
Outdated
@@ -127,6 +120,13 @@ func (p *HTTPProxyProcessor) computeHTTPProxy(proxy *contour_api_v1.HTTPProxy) { | |||
} | |||
pa.Vhost = host | |||
|
|||
// ensure root httpproxy lives in allowed namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why the ordering of these checks is important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks.
e3468bc
to
1ea05d8
Compare
This commit adds code for DAG status updates to emit Conditions instead of updating the `currentStatus` and `description` fields only. Instead, there is a `Valid` condition that indicates the same information, but allows extra details to be added via SubConditions. To do this, there's a new `status` package that handles status during the DAG build, including a cache. This has been built with extensibility towards ExtensionService and the service-apis types in mind, so we should be able to add support for emitting Conditions on them all as well reasonably easily. After discussion on projectcontour#2886, I've also locked in the behavior of the additions to the `apis/projectcontour/v1` helpers with tests. I'll do another pass after I finish out this work and add it to the auth helpers as well. Signed-off-by: Nick Young <[email protected]>
- Better explanation of how SubConditions affect DetailedConditions - Truncate long messages - Test the rest of the added functions in `apis/projectcontour/v1/helpers_test.go`. Signed-off-by: Nick Young <[email protected]>
- get rid of GetConditionIndex as a public API, replaced with `GetConditionFor` on HTTPProxyStatus. - ProxyUpdate updated to only hold the relevant fields from the HTTPProxy it's for. Signed-off-by: Nick Young <[email protected]>
TODO: - metrics - turns out they use status extensively, sigh. - featuretests need fixing. Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Also fix all the tests. Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
1ea05d8
to
66cc035
Compare
Here's where I'm up to with the Conditions work.
So far, I'm at:
status_test.go
passes for everything.internal/dag/metrics.go
has been updated with a new method of calculation, it turned out we were counting some things as root HTTPProxies even if they weren't valid for some reason, so I had to change the tests.