Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for DAG status to emit Conditions #2962

Merged
merged 16 commits into from
Oct 2, 2020

Conversation

youngnick
Copy link
Member

@youngnick youngnick commented Oct 1, 2020

Here's where I'm up to with the Conditions work.

So far, I'm at:

  • DAG now emits conditions, status_test.go passes for everything.
  • internal/dag/metrics.go has been updated with a new method of calculation, it turned out we were counting some things as root HTTPProxies even if they weren't valid for some reason, so I had to change the tests.
  • featuretests all pass.

internal/status/cache.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 1, 2020

Codecov Report

Merging #2962 into main will decrease coverage by 1.40%.
The diff coverage is 87.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2962      +/-   ##
==========================================
- Coverage   76.45%   75.05%   -1.41%     
==========================================
  Files          74       90      +16     
  Lines        5806     5869      +63     
==========================================
- Hits         4439     4405      -34     
- Misses       1272     1370      +98     
+ Partials       95       94       -1     
Impacted Files Coverage Δ
cmd/contour/certgen.go 29.72% <0.00%> (-2.22%) ⬇️
cmd/contour/cli.go 0.00% <0.00%> (ø)
cmd/contour/contour.go 0.00% <0.00%> (-4.23%) ⬇️
cmd/contour/ingressstatus.go 33.33% <0.00%> (-2.16%) ⬇️
cmd/contour/leadership.go 0.00% <0.00%> (ø)
internal/annotation/annotations.go 100.00% <ø> (ø)
internal/build/version.go 0.00% <ø> (ø)
internal/contour/observer.go 0.00% <0.00%> (ø)
internal/dag/status.go 0.00% <0.00%> (-97.44%) ⬇️
internal/envoy/route.go 86.36% <ø> (-12.03%) ⬇️
... and 143 more

@youngnick youngnick force-pushed the status-conditions-actual branch from d1b0386 to 98a2323 Compare October 1, 2020 11:59
@youngnick youngnick marked this pull request as ready for review October 1, 2020 11:59
@youngnick youngnick changed the title WIP: Add code for DAG status to emit Conditions Add code for DAG status to emit Conditions Oct 1, 2020
Root: map[metrics.Meta]int{
{Namespace: "finance"}: 1,
},
Root: map[metrics.Meta]int{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here, is the test wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, but I hadn't looked deeply enough into why; I needed to change the order of some things in httpproxy_processor for the information to flow out correctly.

func (d *DAG) Statuses() map[types.NamespacedName]Status {
return d.statuses
// GetTestStatuses returns a slice of Status objects associated with
// the computation of this DAG, for testing status output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the function defined to only emit HTTPProxy conditions? Can we add a TODO and an issue to remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it. We won't be able to remove it completely without completely refactoring the status tests, as they need the name of the object and the detailed object. That will need a lot of extra testing machinery to sit between the status.Cache and the status tests.

#2967 opened to track.

internal/k8s/statuscache.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
// When we're committing, if we already have a Valid Condition with an error, and we're trying to
// set the object back to Valid, skip the commit, as we've visited too far down.
// If this is removed, the status reporting for when a parent delegates to a child that delegates to itself
// will not work. Yes, I know, problems everywhere. I'm sorry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref an issue to clean this up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create an issue, but I don't see how it's fixable, as it's an artifact of:

  • Conditions are designed to allow multiple errors to be added, so we removed the behavior that only the first commit() makes it into the cache.
  • If you allow multiple commits, in this case, because the child is recursed into twice during computeRoutes, the error is added correctly once, and then it's overwritten by a Valid condition.
  • So, we're going to have to do something to check that we don't flip a condition back to Valid: true if there's already an error on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2968 is the related issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds to me like this is a problem with the API design but we can address in #2968.

I'll create an issue, but I don't see how it's fixable, as it's an artifact of:

  • Conditions are designed to allow multiple errors to be added, so we removed the behavior that only the first commit() makes it into the cache.

Need to consider what commit really means. If we committed a change then we want to make a change on top of that commit, the API should support that rather than hack it up at the end.

  • If you allow multiple commits, in this case, because the child is recursed into twice during computeRoutes, the error is added correctly once, and then it's overwritten by a Valid condition.

I would have thought that a child proxy was either valid or not; I'm surprised that it depends on the context. If it is dependent on context, then I expect this could also happen across proxies.

  • So, we're going to have to do something to check that we don't flip a condition back to Valid: true if there's already an error on it.

Yup, sounds like it. I think that the problem here is that ProxyAccessor isn't cache-aware. Each time the cache is accessed, the latest accessor should get the most recent condition state. We need to be careful about reentrancy too (either disallow that or have some merge strategy when it happens).

Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handful of things so far, still looking..

Root: map[metrics.Meta]int{
{Namespace: "finance"}: 1,
},
Root: map[metrics.Meta]int{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing why the expected results in this file should change, i.e. why the proxy's not being counted as a root?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the object itself wasn't available; I had to change the order of things at the top of computeRoutes in httpproxy_processor.go to enable the vhost info to propagate out correctly.

The remaining changes in metrics_test.go are because we actually emit more info now about the vhost. In addition, there was one time when we were actually counting things incorrectly.

internal/dag/status_test.go Outdated Show resolved Hide resolved
internal/status/cache.go Outdated Show resolved Hide resolved
internal/contour/metrics.go Outdated Show resolved Hide resolved
cmd/contour/serve.go Outdated Show resolved Hide resolved
internal/status/cache.go Show resolved Hide resolved
internal/dag/httpproxy_processor.go Outdated Show resolved Hide resolved
internal/dag/httpproxy_processor.go Show resolved Hide resolved
@@ -114,18 +118,21 @@ func TestDAGStatus(t *testing.T) {
},
}

run("valid proxy", testcase{
run(t, "valid proxy", testcase{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new to this PR but feels weird to have this test nested with the other proxy setup types and not with the other test cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean? It's below the proxy where it's used, like the rest.

internal/dag/status_test.go Outdated Show resolved Hide resolved
internal/dag/status_test.go Outdated Show resolved Hide resolved
internal/k8s/statuscache.go Outdated Show resolved Hide resolved
suc.objectCache[suc.objKey(o.Name, o.Namespace, contour_api_v1.HTTPProxyGVR)] = obj
// fmt.Printf("Saved %s to cache, cache has %d entries\n", suc.objKey(o.Name, o.Namespace, contour_api_v1.HTTPProxyGVR), len(suc.objectCache))
default:
panic(fmt.Sprintf("status caching not supported for object type %T", obj))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like the panic when an invalid type is encountered, not sure how others feel about but, but to me, doesn't seem like case where Contour should be terminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it needs to stay, should update the message to understand a bit more about where this was encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an assertion that programmers obeyed the invariants, so panic is OK by me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the panic will only happen if a developer has passed the wrong type of thing, so this should never happen. This is belt-and-suspenders territory.

@@ -127,6 +120,13 @@ func (p *HTTPProxyProcessor) computeHTTPProxy(proxy *contour_api_v1.HTTPProxy) {
}
pa.Vhost = host

// ensure root httpproxy lives in allowed namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining why the ordering of these checks is important?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks.

@youngnick youngnick force-pushed the status-conditions-actual branch from e3468bc to 1ea05d8 Compare October 2, 2020 05:36
Nick Young added 16 commits October 2, 2020 05:53
This commit adds code for DAG status updates to emit Conditions
instead of updating the `currentStatus` and `description` fields
only. Instead, there is a `Valid` condition that indicates the same
information, but allows extra details to be added via SubConditions.

To do this, there's a new `status` package that handles status during
the DAG build, including a cache. This has been built with extensibility
towards ExtensionService and the service-apis types in mind, so we
should be able to add support for emitting Conditions on them all
as well reasonably easily.

After discussion on projectcontour#2886, I've also locked in the behavior of the
additions to the `apis/projectcontour/v1` helpers with tests. I'll
do another pass after I finish out this work and add it to the auth
helpers as well.

Signed-off-by: Nick Young <[email protected]>
- Better explanation of how SubConditions affect DetailedConditions
- Truncate long messages
- Test the rest of the added functions in `apis/projectcontour/v1/helpers_test.go`.

Signed-off-by: Nick Young <[email protected]>
- get rid of GetConditionIndex as a public API, replaced with `GetConditionFor` on HTTPProxyStatus.
- ProxyUpdate updated to only hold the relevant fields from the HTTPProxy it's for.

Signed-off-by: Nick Young <[email protected]>
TODO:
- metrics - turns out they use status extensively, sigh.
- featuretests need fixing.

Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
Also fix all the tests.

Signed-off-by: Nick Young <[email protected]>
Signed-off-by: Nick Young <[email protected]>
@youngnick youngnick force-pushed the status-conditions-actual branch from 1ea05d8 to 66cc035 Compare October 2, 2020 05:53
@youngnick youngnick merged commit b4cc27e into projectcontour:main Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants