Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypt pod logs in bootstrap runner and decrypt in Tentacle #1047

Merged
merged 8 commits into from
Dec 1, 2024

Conversation

APErebus
Copy link
Contributor

@APErebus APErebus commented Nov 28, 2024

Background

To avoid leaking sensitive values in pod logs, we are now encrypting the logs in the script pods and decrypting them in Tentacle

Results

The process is now:

  1. Tentacle, when creating a script pod, generates and stores a 32 byte key and writes it to a file keyfile in the script workspace. This key is a PBKDF2-derived hash of the tentacle machine encryption key salted with the script ticket id. This is done so that if the NFS is lost and tentacle restarted by the watchdog (ala automatic upgrade), that the key can be regenerated again so running pod logs can be decrypted.
  2. The bootstrap runner reads this key when starting and initializes a AES GCM cipher, this key is then used to encrypt all log messages being written to the pod logs
  3. Tentacle, when it receives pod logs, loads the encryption key for a script ticket (either from memory or from the key file) and uses it decrypt the log message
  4. ...
  5. Profit!

The nonce, generated on a per-log message bases in the bootstrap runner, is prepended to the encrypted message and split in the tentacle decryption.

I had to do a bit of refactoring to the KubernetesMachineKeyEncryptor so that we could now retrieve the machine key in multiple places.

After

Example pod log output

|53|stdout|e386c2044b8ca0ef28813aad82361dcc66cda9e83b4212d0a1a77d019296784845ef65b994a3e25258f2cf8dad11fd31464911d3c83529463f4c1331c13b5001d48d2b372d162e14e4d5439db3294a7b3c9c7013152b016dcf
|54|stdout|2fba0762f2d704ca1f3f217c7e0a97732c1fc3c366c650d7a60a3938412ceaeac723c917aa0ad3c4144431eb0f9edbe0d6ef388e63ee1d32478a4167e0d2a2
|55|stdout|21261f4d524fdace95c5a292c3c3df0bb6a4bc4bfc40a9259beaa9fc9717802904c0dc2796b8aa80ade202d72575733e6fdc264e36ebfc2ac6876a72c26e3aea5b4db3ea6cba128fb8bdbfc607551522286e
|56|stdout|59df43e039d22cb062e130eefd3b4b776dacb2debd34ace232ec1f9598884ea7418a19436438fb76ab513e27577dc30f3c8155094ea23db3c95d124137d4
|57|stdout|924e2b9fc6d69857fd77334a6719fc5106660136599ee0dda862f8b22a6b0e15b29eba243d9c59775d8d32d6ab1d471a7b58efcba684df2c023051606e784ca4992e
|58|stdout|3e201912998213b4c9142eeaee8785a02e7ff693de543b4b60243e83ce8bfe11efce3d34daa57294afe9923b724a9bcfb3e686f49d16d3150e82ae93c2b9
|59|stdout|4863c2997bc6a9a521476a67927201c30f7806fd7118514e279e030c
|60|stdout|1003496491189169f10f2eb6caed5279e2a8b48431cab6960ad60264
|61|stdout|deaa311679e09df5608d1f59883324e7bdea3de7ff9badf864d1c3df97e5d96078d0dc9a0aba059549a121ed729223ada10f0e6cda
|62|stdout|a5f569c165ca2f039fbb2020349c5d1ca64a82963a28100cc44c360d3921f00c8894c3b3627348ff55350eb2109969040738cdcbf1
|63|stdout|9fbf30b1685d7155548c4a15e6d835f7f74aa06c2ce67d7c517116a7495b4082a649cde69e8684d921cedd82b88c2a56a463eea57280eb563d9e201f88441d
|64|stdout|ed030f4fc8bb247f6a2867d306b0ec36c7ebfa948e34932a35cd6a86ff3951e249316af7eee85194a3a99b84cc3cd671c0824a316358ccd8a4c01afe545e
|65|stdout|c01f90998e541bb8a61404757a4996c85384bdb565c0ccc5b87e0c891bde15611b2adac8d32c631e4aa0129f97f70e
|66|stdout|35aa82f01d1f47ca5b10249de719d030d1b280e266b41f61f06745c2a52ea0f9cb8e34ab62ee0e1af79bdaad4ffa02a8216c
|67|stdout|fee5431412851e6db0daf13e647df76ca09bb3c189cf6f3e486082dd24119eeb4ad286e8a6442609b981915b6daf1a4212d3eff20bc57837fbe67921f49d9a9700ba87a169d0d5cce189fd37cff019bc659b68
|68|stdout|20c1b3a8c485a538d65da53c22bfa5f0324a587747682a12a81706fb1eeebcf4db4ccc82bfbaa4b1a90ced011ee8c5c73c5be9ae7b41ddc42c00557ad128722e3f5289937dcab29df3310ba32103ad5d0bc6
|69|stdout|bbb1d4a1918c611966fb118bd45a0b13ca4bb3faca8dc30b81e71d84ad1fa4025eda866463820089184a52bdc44f75a62e85
|70|stdout|8958e9468d7e4446358f83d921b64ed0b503c9a81c5b89dc46a3245576bcaa567cc1002da10abd75b467a1f6a76bdb713591ae83
|71|stdout|3b0ed4aac814f99a0959ffa3235f01b8c42a374434b28855ea24fe45e6f8146c214322a85d2a09841eb6b8270e9b889d59403d00e0
|72|stdout|9b28051a7652da92571d305d3e25899307d27ed8ef89a717767d9cf2bd
|73|stdout|5848c2b0ae9ff0ea7e4bb18bc581fcdb720992ba3dcf1dd03226016418
|74|stdout|1e3810f7ee59e2021e562401ac4de0ebfbfcd7dc0b4fe638c980d19cce
|75|stdout|be187a41930ab76b9f8a7703c9186ba81fcdea3dca97106f3593bd8b5a

The output in Server
image

How to review this PR

Quality ✔️

Pre-requisites

  • I have read How we use GitHub Issues for help deciding when and where it's appropriate to make an issue.
  • I have considered informing or consulting the right people, according to the ownership map.
  • I have considered appropriate testing for my change.

@APErebus APErebus marked this pull request as ready for review November 28, 2024 04:08
@APErebus APErebus requested review from a team as code owners November 28, 2024 04:08
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to Kubernetes/Crypto and updated with new interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have 2 places where we need to retrieve the machine encryption key from the secret, so splitting this code into an interface

Comment on lines +33 to +34
//we use the machine encryption key as the password and the script ticket is the salt
pdb.Init(PbeParametersGenerator.Pkcs5PasswordToBytes(Encoding.ASCII.GetChars(machineEncryptionKey)), Encoding.UTF8.GetBytes(scriptTicket.TaskId), 1000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we use the machine encryption key as the "password" and the script ticket as the salt to generate a known 32 byte key

Comment on lines +72 to +77
//If we can't load the encryption key from the filesystem
if (fileContents == null)
{
//regenerate the encryption key, write to the filesystem and return the key
return await GenerateAndWriteEncryptionKeyfileToWorkspace(scriptTicket, workspace, cancellationToken);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tentacle gets the encryption key, if it can't load it from memory or the script workspace, we regenerate the same key and store it in memory and the disk. This should only happen after a tentacle & NFS pod restart.

Comment on lines +106 to +111
//if we can't read the pod log encryption key for a while
var message = $"Failed to read pod log encryption key. No new pod logs will be read.";
tentacleScriptLog.Verbose(message);
Log.Warn(ex, message);

//If we somehow come across weird/missing line numbers, try load the whole Pod logs to see if that helps
return await GetPodLogsWithSinceTime(null);
return (new List<ProcessOutput>(), lastLogSequence, null);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we can't read the encryption key, don't read the pod logs

Copy link
Contributor

@liam-mackie liam-mackie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but not enough to block. Thanks for the hard work!

this.log = log;
}

public async Task WriteEncryptionKeyfileToWorkspace(ScriptTicket scriptTicket, CancellationToken cancellationToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could do with a rename (follow up PR?)

var pdb = new Pkcs5S2ParametersGenerator(new Sha256Digest());

//we use the machine encryption key as the password and the script ticket is the salt
pdb.Init(PbeParametersGenerator.Pkcs5PasswordToBytes(Encoding.ASCII.GetChars(machineEncryptionKey)), Encoding.UTF8.GetBytes(scriptTicket.TaskId), 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider fewer conversion passes and simply pass through the machineEncryptionKey bytearray

{
var message = $"Unexpected Pod log line numbers found with sinceTime='{sinceTime}', loading all logs";
//if we can't read the pod log encryption key for a while
var message = $"Failed to read pod log encryption key. No new pod logs will be read.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, we still get whether the script exited successfully from K8s, so this is a reasonable trade-off to protect against transient decryption problems.

{
var message = $"Unexpected Pod log line numbers found with sinceTime='{sinceTime}', loading all logs";
//if we can't read the pod log encryption key for a while
var message = $"Failed to read pod log encryption key. No new pod logs will be read.";
tentacleScriptLog.Verbose(message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: change this to a warning

if (messagePart.StartsWith(EndOfStreamMarkerPrefix))

//the log messages are being returned from the pods encrypted, decrypt them here
var decryptedMessagePath = encryptionProvider.Decrypt(encryptedMessagePart);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this might be better named decryptedMessagePart (or decryptedMessage)

Copy link
Contributor

@LukeButters LukeButters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For safety we should check this version of tentacle works with a full build of octopus server.

@APErebus
Copy link
Contributor Author

APErebus commented Dec 1, 2024

For safety we should check this version of tentacle works with a full build of octopus server.

@LukeButters Yep, this was done in https://github.com/OctopusDeploy/OctopusDeploy/pull/29373 and was fully green!

@APErebus
Copy link
Contributor Author

APErebus commented Dec 1, 2024

ADR for change here: https://github.com/OctopusDeploy/adr/pull/68

@APErebus APErebus merged commit f4ed2ae into main Dec 1, 2024
52 of 53 checks passed
@APErebus APErebus deleted the ap/k8s-logs-to-file-not-pod-logs branch December 1, 2024 23:26
@APErebus
Copy link
Contributor Author

APErebus commented Dec 2, 2024

Shortcut story: [sc-98480]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants