Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Policy WG Discussion: Formal Verification #196

Closed
rficcaglia opened this issue Jun 6, 2019 · 8 comments
Closed

Kubernetes Policy WG Discussion: Formal Verification #196

rficcaglia opened this issue Jun 6, 2019 · 8 comments
Labels
policy policy related topics

Comments

@rficcaglia
Copy link
Contributor

rficcaglia commented Jun 6, 2019

I recommend deprecating this issue and referring interested readers to a discussion on the Policy WG call documented here:

https://docs.google.com/document/d/1ihFfEfgViKlUMbY2NKxaJzBkgHh-Phk5hqKTzK-NEEs/edit#

DEPRECATED:

During the K8S Policy WG session today (6/5/2019) we discussed how/if policies might be formally verified. We talked about using Datalog (pros and lots of cons) in formal verification, and insofar as OPA's Rego is similar to Datalog (though not strictly Datalog), it maybe might be possible. Maybe. In any case, I volunteered to write up a strawman outline of what this might look like in very hand wavy terms to get the discussion started. @hannibalhuang asked me to put it here in sig-security. @patrick-east @ericavonb were also interested in reviewing I believe. Enjoy...

Formal Verification of Policy In Kubernetes

  • Human writes a policy
    • it might be a policy to grant or deny user access to a resource in a multi-tenant cluster,
    • or it might be a policy that requires certain syscall activity to be monitored on some pods with certain labels,
    • or it might be a policy that says network traffic that is regulated by PCI or HIPAA should be read-only to some microservices but writeable by others,
    • or it might be a policy that specifies some alert action to be triggered when a given audit event occurs;
  • The policy is essentially a specification of what the expected behavior should be for the system, i.e. System + Policy => Safety Properties (nothing bad happens)
  • A tool checks "validity" of the policy and that the system execution matches the specification.
    • produces verification "proof" (model) if the policy is correct
      – or generates a counterexample if the policy is not correct
  • Verification is completely automatic
    • Human can say with confidence, "this policy correctly implements the behavior I intended"
    • Software can use the proof/model and reason with it.

How is Verification Done?

  • Conditions in Logic (e.g. Rego rules)
  • Given a set of logic rules, P, check whether there exists a proof/model of P
  • Given a model m (verification conditions) now use a solver to try to solve them
@justincormack
Copy link
Contributor

Sorry, this is somewhat unclear. If I write a policy, who is writing the specification it is checked against? I think the summary needs spelling out in more detail.

There are actually interesting things you can check on a single policy, but they are not correctness. eg AWS has done some work on finding parts of a policy that are "dead code" as they probably indicate mistakes, and finding policies that map to "probably unintended things" eg world writeable S3 buckets (potentially a safety property).

@rficcaglia
Copy link
Contributor Author

rficcaglia commented Jun 6, 2019 via email

@justincormack
Copy link
Contributor

I wasn't on the call but just caught up. I think the system behaviour is different though. I think writing up some quite detailed concrete examples would help understand what we think we want to do and what is feasible.

@rficcaglia
Copy link
Contributor Author

rficcaglia commented Jun 6, 2019 via email

@copumpkin
Copy link

(I was the AWS person on the call 👋)

Thanks for writing this up @rficcaglia! Regarding @justincormack's question and this bullet:

A tool checks "validity" of the policy and that the system execution matches the specification.
produces verification "proof" (model) if the policy is correct

I wouldn't phrase it that way. The way I like to think about it is that we have three things going on in a system:

  1. The high-level human intent behind the policy
  2. The policies they wrote (ostensibly) to achieve that intent
  3. The code that evaluates those policies

While it's possible to formally verify with sufficient effort that (3) is implemented properly (which is I think what you're getting at in the bullet I quoted above), what I was suggesting was the "low-hanging fruit" of validating (1) against (2).

Here's an example: say you have policy in many parts of a system to implement the desired access structure for a business. Although policy is more succinct than arbitrary code, it can still be gnarly for humans to understand, and tends to accrete weird special cases over time. But we also often have simple high-level invariants we'd like to hold over our policy: asserting that nobody can escalate privileges except by permitted auditable mechanisms, or asserting things about all principals with the power to affect the availability of the system. You can think of these almost like logical "tests" of your policy: if we didn't have automated reasoning, we might craft a handful of cases we want to hold true, like "Joe in accounting can't reconfigure my production infrastructure" or "Jane the SRE needs access to production logs"; but these logical tools allow us to write "universal tests" that can test conceptually infinite situations in finite time, so instead of concretizing our tests, we now say things like "nobody outside of the SRE group can access production logs" and ask the system if that's true. Narrow but well defined invariants like those take us a step closer to validating (1) against (2), even though (1) is often very fuzzy and informal and impossible to fully specify in general.

So back to @justincormack's question:

If I write a policy, who is writing the specification it is checked against?

I'd propose that k8s security experts write out a model of how e.g., privilege escalation might work in k8s. The average user, even though they care about privilege escalation, can't be expected to understand every dark corner of the API or transitive concerns arising from it. The model would include things like direct privilege escalation "within" the policy engine, but potentially also things like the power to reconfigure the policy engine itself or to otherwise affect the execution of policy. End users then get a simplified interface to make assertions over their policies in terms of those models constructed by the experts. The models could also be parametrized, giving users the power to assert things like "no principal can escalate privileges, except those in this group", and so on.

Of course, there are plenty of other designs that could work, but this is the direction I was proposing on the call.

@timothyhinrichs
Copy link

Agree that a concrete model of "privilege escalation" or "multi-tenancy" are really important here, especially from a methodological perspective. Different technologies in the verification space help us do different kinds of verification. So step 1 is understanding what kind of verification we want to do and therefore what kind of technology we need. In the end what often happens is that we figure out what's possible to verify, and as @copumpkin says parameterize it to the extent we can. From what I've seen, it usually ends up being that the value is less that a user can come along and write their OWN model/query; rather, it's that the user provides their OWN policy to be checked against a fixed (and possibly parameterized) model/query.

Examples. SMT solvers (isn't that what you're using @copumpkin?) are good at finding inputs to a policy that generate a particular class of policy decisions. Model Checkers will find a sequence of inputs that result in a particular class of policy decisions. Theorem provers will guarantee a policy decision is always "true" over all possible inputs. (Here by 'input' think of OPA's input + external data.) Moreover, the devil's often in the details for which kind of technology you use: SMT solvers focus on particular kinds of builtin-functions like arithmetic, regexps, and so on, whereas many theorem provers (last time I looked) often don't have any built-in support for those--requiring you to axiomatize everything you need.

The multi-tenancy verification problem sounds like theorem proving: for all inputs, everyone from group A can access namespace A's resources, but no other namespace's resources. Not sure what privilege escalation means, but it sounds like the answer will be multi-step, pointing to model-checking.

Alternatively, we could start by looking at a particular tool to understand what kind of verification it can do, try out some examples, and go from there. Z3 pops to mind as a good place to start--a bit of SMT and maybe theorem proving if I remember right.

@hannibalhuang hannibalhuang added the policy policy related topics label Jun 19, 2019
@rficcaglia
Copy link
Contributor Author

I was not on the Policy WG call but noting use case sketches discussed by @ericavonb

  • operator privilege escalation (e.g. can I create a resource that can escalate and create new resources)
  • gatekeeper cluster state cache and how actions are evaluated based on state change
  • Tenant CRD certificate creation use cases

I would also add the other discussion on the call about mapping PSP to Gatekeeper to identify gaps:

  • Identify gaps in PSPs vs. Gatekeeper Constraint templates

Also link to the Google Group thread:
https://groups.google.com/d/topic/kubernetes-wg-policy/JLz1zZgi_vk/discussion

@rficcaglia
Copy link
Contributor Author

pending discussion and feedback today (7/24/19) on the Policy WG call, I'll submit a PR which I would consider action enough to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
policy policy related topics
Projects
None yet
Development

No branches or pull requests

5 participants