-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
va: log MPIC summaries #7817
va: log MPIC summaries #7817
Conversation
@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
Pulling in some feedback from #7815 (comment):
These are good points, and MPIC is important enough compliance wise that we should do what we can to programmatically detect config problems. There are big advantages to the primary knowing what its perspectives are too, like being able to abort at startup if the perspectives are wrong. And the primary can report failures by perspective even when that perspective is down. What do you think about this: primary and remotes both know about perspectives, and in each RPC to a remote, the primary asserts what it believes the perspective and RIR are. If the remote disagrees, it returns an error. |
Agreed, comparing the results of both is probably the best call. |
I'll hold off on reviewing this until main is merged. |
I'm fine with this approach in the long term. But the purpose of these smaller PRs was to refactor Samantha's big PR into a collection of more-easily-reviewable changes, not to change the approach taken in that PR. For now, let's stick with having each remote know (and report) its own perspective/RIR, and leave the gRPC request to those remotes alone. We can add having the primary VA know about its remotes' perspectives/RIRs and add checks that they match up later. |
Sounds good, I'll split that part out (and fix up merge conflicts). |
InternalError string `json:",omitempty"` | ||
Perspective string `json:",omitempty"` | ||
RIR string `json:",omitempty"` | ||
MPICSummary *mpicSummary `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPIC summary must always be logged, even if it's empty because we caught a local failure. Our logs don't distinguish between a local problem and a remote problem so it's useful to positively state that we did not attempt MPIC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will change this to not omitempty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document why it's not omitempty
?
va/va.go
Outdated
// Note: len(va.remoteVAs) is greater than len(passed) + len(failed) because some | ||
// are canceled on reaching quorum. | ||
QuorumResult: fmt.Sprintf("%d/%d", len(passed), len(va.remoteVAs)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make an "attempt" I believe that we have to log the outcome of that attempt according to the BRs:
5.4.1 Types of events recorded
...
The CA SHALL record at least the following events:
...
Multi-Perspective Issuance Corroboration attempts from each Network Perspective, minimally recording the following information:
- a. an identifier that uniquely identifies the Network Perspective used;
- b. the attempted domain name and/or IP address; and
- c. the result of the attempt (e.g., "domain validation pass/fail", "CAA permission/prohibition").>
Multi-Perspective Issuance Corroboration quorum results for each attempted domain name or IP address represented in a Certificate request (i.e., "3/4" which should be interpreted as "Three (3) out of four (4) attempted Network Perspectives corroborated the determinations made by the Primary Network Perspective).
va/va.go
Outdated
// Note: len(va.remoteVAs) is greater than len(passed) + len(failed) because some | ||
// are canceled on reaching quorum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe these are cancelled, they simply hit timeout after we've already returned the ValidationResult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In grpc/interceptors.go we apply a deadline, and then defer cancel()
. The cancel gets run after the handler returns.
Lines 114 to 116 in 2502113
localCtx, cancel := context.WithDeadline(ctx, deadline) | |
defer cancel() | |
return &mpicSummary{ | ||
PassedPerspectives: passedPerspectives, | ||
FailedPerspectives: failedPerspectives, | ||
PassedRIRs: passedRIRs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're collecting up the RIRs but we're not ensuring that passing validations came from at least two unique RIRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. My goal here was to add logging, not enforcement.
va/caa.go
Outdated
@@ -255,17 +256,17 @@ func (va *ValidationAuthorityImpl) performRemoteCAACheck( | |||
result.Problem = probs.ServerInternal("Remote VA IsCAAValid RPC cancelled") | |||
} else { | |||
// Handle validation error. | |||
va.log.Errf("Remote VA %q.IsCAAValid failed: %s", rva.Address, err) | |||
va.log.Errf("Remote VA %q.IsCAAValid failed: %s", perspective, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rva.Perspective is probably less useful than rva.Address when trying to find a remote VA instance that's acting up so you can terminate it.
va/caa.go
Outdated
result := &remoteVAResult{ | ||
VAHostname: rva.Address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed, if you add it back as I have suggested in other comments it should be Address
as this is the [IP Address]:Port of the remote VA.
va/caa_test.go
Outdated
{brokenVA, testPerspective1, testRIR1}, | ||
{remoteVA, testPerspective2, testRIR2}, | ||
{remoteVA, testPerspective3, testRIR3}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the test cases here stop at 3 perspectives, but the number of allowed failures increases to 2 at 6+
Table: Quorum Requirements
# of Distinct Remote Network Perspectives Used | # of Allowed non-Corroborations |
---|---|
2-5 | 1 |
6+ | 2 |
We should add a test that checks this at exactly below and above that threshold.
va/va.go
Outdated
for i, va1 := range remoteVAs { | ||
for j, va2 := range remoteVAs { | ||
if i != j && va1.Perspective == va2.Perspective && va1.Perspective != "" { | ||
return nil, fmt.Errorf("duplicate remote VA perspective %q", va1.Perspective) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a check like this could give someone a false sense of confidence, when in reality the backends it starts with will be steadily replaced as remote VAs scale up and down daily. It probably doesn't need to go away, but we should comment that these are also evaluated when receiving each validation/check RPC.
Given the complexity of the requirements here (as exemplified by the comments that Samantha just left) I'm unsure if the "lots of small refactors" plan should go this far. Right now I'm thinking it may be best for the two biggest pieces of Samantha's PR -- MPIC enforcement, and dropping CAA -- to remain in her original PR, rebased on top of the great small cleanups landed this week. Do we think that the changes made so far have improved the reviewability of that PR sufficiently? Maybe I'm totally wrong, and this is the right approach to be taking. But it worries me to be going over MPIC requirements (like "what's the denominator in the quorum fraction") again, when that PR had already resolved those. |
You both make the good point that this PR got too big! I noticed the issue of "oops sometimes we log hostnames instead of perspectives," and fixing it, well... snowballed. I've update this to only add the logging, and leave the hostnames alone for a future PR. I still need to add some test cases that check for the logging. I've also changed the base to |
Closing in favor of #7870 |
Add "Perspective" and "RIR" configuration at the primary VA for each of its backends. This information is passed through to the VA as part of the
RemoteVA
struct.Remove the "Hostname" field from the
RemoteVA
struct. It was previously used in logging errors, but Perspective will be a clearer way to indicate those errors.Hoist the logic for "am I primary?" from
performRemoteValidation
to PerformValidation. This allowsperformRemoteValidation
to more consistently return a set of succeeded and failed results.Note: this is a stacked change on top of #7815. DO NOT MERGE until #7815 is merged. Also, this change pulls in the change from
good++
/bad++
to slices ofpassed
/failed
from #7814.