The Bottom Turtle Reference Architecture(s) #5206

kfox1111 · 2024-06-10T15:21:51Z

It should be possible to use SPIRE as the bottom turtle for security. In order to do so, there has to be one or more deploy-able, maintainable, scalable, fault tolerant, and documented SPIRE architectures that do not rely on 3rd party roots of trust as part of establishing that root of trust.

The easiest to use multinode setup currently is the helm charts. The helm chart project has multiple documented reference architectures for SPIRE. But all of them rely on the Kubernetes clusters preestablished control plane/node trust. So SPIRE isn't the bottom turtle in those environments. The K8s CA is.

We need:

One or more examples, from the ground up, that is able to establish the bottom turtle(s) in an internet disconnected environment
One or more bill of materials for a home lab setup
One example:
Reasoning: PI 5 over others because it has a RTC and m.2 support. Clock important for cert issuance. m.2 for better write durability. Recommend as much ram as possible. Not for SPIRE, but for future use as memory is not upgradable. Known working:
- Raspberry Pi 5
- RPI Power Supply
- KKSB Case for Raspberry Pi 5 with Space for Hats - https://www.amazon.com/dp/B07Y7NTTG8
- Raspberry Pi Active Cooler for Raspberry Pi 5 - https://www.amazon.com/dp/B0CLXZBR5P
- GeeekPi TPM2.0 Module for Raspberry Pi - https://www.amazon.com/dp/B09G2BZQT5
- RTCBattery Box - https://www.amazon.com/dp/B0CRKQ2MG1
- CR2032 Battery
- Pick one
  - Option A
    - Geekworm X1001 PCIe to M.2 HAT - https://www.amazon.com/dp/B0CPPGGDQT
  - Option B
    - PCIe to 2-CH M.2 Adapter Compatible with Raspberry Pi 5 - https://www.amazon.com/dp/B0DBH6MHK8
    - Female Pin Header,2.54mm 2 Row 40 Pin Right Angle - https://www.amazon.com/dp/B07VK75P9L
- Pick one
  - Crucial P3 500GB PCIe Gen3 3D NAND NVMe M.2 SSD - https://www.amazon.com/dp/B0B25LQQPC
  - Patriot P300 M.2 PCIe Gen 3 x4 128GB - https://www.amazon.com/dp/B0822Y6N1C
Procedures around establishing trust and reestablishing trust when broken (ex, CA rotation too fast and nodes powered off for to long)
Procedures for scaling up the number of nodes in that environment without a lot of work
A set of issues to raise for features that will make the use case of those architectures better

Other considerations:

Recovery should be o(1) or as close as possible. Touching every node in a 100 or 1,000+ node data center wont work.

kfox1111 · 2024-06-10T15:54:04Z

For sake of discussion, what could be done with a set of RPI's with some kind of TPM, like:
https://wiki.52pi.com/index.php/EP-0149

kfox1111 · 2024-06-10T18:56:01Z

Kubelet is gaining the ability to refresh server certs, merged but not released yet:
kubernetes/kubernetes#124574

client auth can be done via jwt token.

No updating of CA's yet though.

anvega · 2024-06-11T02:31:53Z

Thank you for raising this. I'm currently exploring alongside others the possibilities of using OpenTitan as the silicon root of trust to anchor and bootstrap trust.

Although my exploration is ongoing, I'm eager to collaborate and share my findings.

edwbuck · 2024-06-11T19:12:12Z

For sake of discussion, what could be done with a set of RPI's with some kind of TPM, like: https://wiki.52pi.com/index.php/EP-0149

I'm confused about the focus of the request, as using Raspberry PI TPMs is a deployment detail, not an architecture (at least in my mind).

If support for the "Infineon Optiga™ SLB 9670 TPM 2.0" is missing, and a pre-requisite for this effort, please consider handling that missing pre-req in a different issue (and linking the two).

kfox1111 · 2024-06-11T21:32:34Z

@edwbuck For example, see:
https://www.hpe.com/psnow/doc/a00020437enw?jumpid=in_pdfviewer-psnow, page 4, "Reference Configuration overview" or page 5, "Hardware"

They go all the way down to an example of workable hardware in their reference.

The general idea being, reference architectures should be implementable. Having a concrete, working example helps test/prove it works.

amartinezfayo · 2024-06-29T14:48:51Z

Thank you, @kfox1111, for raising this issue!

I agree that having a documented reference architecture to use SPIRE as the bottom turtle would be great to have. Additionally, providing a concrete, working example that includes all components would be highly beneficial as it ensures reproducibility. I think that it is important, however, to clearly differentiate between example-specific choices and general recommendations. I personally think that this reference should ideally mention alternative options where appropriate and explicitly state what has been tested.

From the points mentioned in the description, I believe the first point, 'One or more examples, from the ground up, that can establish the bottom turtle(s) in an internet-disconnected environment,' is probably the most important to start with? If you agree, we could begin by scoping out what this would entail. For instance, should it be purely documentation, or should we include a fully working example with automated steps, etc.

It appears that there are several individuals interested in contributing to this effort. Defining the specific environment and components of this first instance of a reference architecture seems to be the first step.

kfox1111 · 2024-07-01T14:24:24Z

From the points mentioned in the description, I believe the first point, 'One or more examples, from the ground up, that can establish the bottom turtle(s) in an internet-disconnected environment,' is probably the most important to start with? If you agree, we could begin by scoping out what this would entail. For instance, should it be purely documentation, or should we include a fully working example with automated steps, etc.

Yeah, that sounds good to me.

I'm thinking purely documentation, at least initially.

I'm also thinking something like a RPI for it, or one of the initial examples. They are cheep, and relatively easily obtained for anyone wanting to play with them at home.

amartinezfayo · 2024-07-10T12:06:24Z

I'm thinking purely documentation, at least initially.

Sounds good. In the last SPIRE contributor sync, @edwbuck kindly offered his help on this. He has some ideas also about how to better frame this work that I think will help in the definition of the scope. Thank you @edwbuck and @kfox1111!

edwbuck · 2024-07-11T04:08:26Z

@amartinezfayo @kfox1111 I attempted to clarify the request by editing this issue; but, as a non-maintainer, I lack the permissions to edit the issue. My clarifications of the request, as well as removal of the confusing "SPIRE is the bottom turtle" commentary, when some aspects of node attestation defer to a bottom turtle of TPM are captured in #5291

I suggest either using that issue to update the text here (closing #5291 , or closing this issue with the transfer of effort to #5291

kfox1111 · 2024-07-11T16:06:36Z

TPMs being used for NodeAttestation does not block SPIRE from being the bottom turtle IMO, and isn't the purpose I'm trying to get at. spire-server is the root of the trust with its CA chain for the whole spiffe trust domain. TPMS are just replacing the use of JoinTokens, which I think we can probably agree, are allowed in a bottom turtle architecture. I think TPMS would help make the process easier/smoother, but if we did the first example with join tokens, it would be ok.

I think the request in general is still valid. We need documented reference architectures, where the spire-server is not relying on other CA's for the bottom turtle for the spire-server itself.

For example, helm installing helm-charts-hardened today, causes a spire-server to be deployed that wont function in the absence of the kubernetes client CA that all the kubelets use, really making that CA one of the bottom turtles. That along with the etcd CA k8s uses for resource storage, which is a second CA that spire-server is really dependent on.

I'm interested in examples where, you deploy the spire-server on bare metal, without any CA's involved, establish your SPIRE root CA, and then use that as the root CA for other nodes to form usable clusters/services. If any steps before spire-server deployment involve making a CA/certificate (puppet register, kubeadm join, etc) then I don't think SPIRE is really the bottom turtle.

amartinezfayo added the triage/in-progress Issue triage is in progress label Jun 11, 2024

amartinezfayo self-assigned this Jun 18, 2024

amartinezfayo assigned edwbuck and unassigned amartinezfayo Jul 9, 2024

amartinezfayo added priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Jul 10, 2024

edwbuck removed their assignment Jul 16, 2024

amartinezfayo added the unscoped The issue needs more design or understanding in order for the work to progress label Jul 24, 2024

kfox1111 mentioned this issue Oct 18, 2024

x509pop server plugin support for servers trust bundle #5572

Open

kfox1111 self-assigned this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Bottom Turtle Reference Architecture(s) #5206

The Bottom Turtle Reference Architecture(s) #5206

kfox1111 commented Jun 10, 2024 •

edited

Loading

kfox1111 commented Jun 10, 2024

kfox1111 commented Jun 10, 2024

anvega commented Jun 11, 2024

edwbuck commented Jun 11, 2024

kfox1111 commented Jun 11, 2024

amartinezfayo commented Jun 29, 2024

kfox1111 commented Jul 1, 2024

amartinezfayo commented Jul 10, 2024

edwbuck commented Jul 11, 2024

kfox1111 commented Jul 11, 2024

The Bottom Turtle Reference Architecture(s) #5206

The Bottom Turtle Reference Architecture(s) #5206

Comments

kfox1111 commented Jun 10, 2024 • edited Loading

kfox1111 commented Jun 10, 2024

kfox1111 commented Jun 10, 2024

anvega commented Jun 11, 2024

edwbuck commented Jun 11, 2024

kfox1111 commented Jun 11, 2024

amartinezfayo commented Jun 29, 2024

kfox1111 commented Jul 1, 2024

amartinezfayo commented Jul 10, 2024

edwbuck commented Jul 11, 2024

kfox1111 commented Jul 11, 2024

kfox1111 commented Jun 10, 2024 •

edited

Loading