Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VC-37264: Disable compression to give us time to work on a fix #628

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

maelvls
Copy link
Member

@maelvls maelvls commented Nov 22, 2024

Ref: VC-37264

Why?

On Nov 21, 2024, the Production Engineering team found that the test pipeline was failing due to venctl 1.15.3. They rolled back to venctl 1.15.2, which made the pipeline work again.

After investigating, we found that the Agent's compression feature broke the Agent. The Agent sends its upload request using Content-Encoding: gzip, but the server doesn't process this header and doesn't decompress the body. It copies the contents of the request's body to a bucket without checking its validity. When the backend asynchronously tries to process the contents of the upload, it fails with the error:

json: invalid character \u001f as token

Solution

After some consideration, we decided to remove entirely the compression logic that was added in #594. The --disable-compression flag will no longer do anything, and a log line will be printed when using the flag:

INFO The flag --disable-compression has been deprecated an no longer has any effect.

Previously, we wanted to simply switch the default value of the --disable-compression flag from true to false; we found the new logic clunky due to the fact that users can set --disable-compression=false, so we voted against it. We had also considered adding the flag --disable-compression to the Helm chart but realized that it would impact users that don't use the Venafi Connection mode nor the Key Pair Service Account mode.

Manual test

For this test, I've used the tenant https://ven-tlspk.venafi.cloud/. To access the API key, use the user [email protected] and the password is visible in the page Production Accounts (private to Venafi). Then go to the settings and find the API key, and set it in the env var APIKEY.

The tenant https://ven-tlspk.venafi.cloud/ doesn't have the right tier to pull images, so I use an API key from the tenant https://glow-in-the-dark.venafi.cloud/, that's why I set the env var APIKEY_GLOW_IN_THE_DARK. Ask Atanas to get access to the tenant https://glow-in-the-dark.venafi.cloud/.

export APIKEY=...
export APIKEY_GLOW_IN_THE_DARK=...

Then:

export \
 OCI_BASE=ttl.sh/maelvls \
 VEN_API_KEY=$APIKEY \
 VEN_API_KEY_PULL=$APIKEY_GLOW_IN_THE_DARK \
 VEN_API_HOST=api.venafi.cloud \
 VEN_VCP_REGION=us \
 VEN_ZONE='tlspk-bench\Default' \
 CLOUDSDK_CORE_PROJECT=jetstack-mael-valais \
 CLOUDSDK_COMPUTE_ZONE=europe-west1-b \
 CLUSTER_NAME=test-secretless

Then, create three files: get.sh, create.sh, and rm.sh:

#!/bin/bash
# get.sh

set -o nounset -o errexit -o pipefail

if [ $# -ne 1 ]; then
    echo "Usage: $0 <common-name>"
    exit 1
fi
commonname=$1

curl -sS --fail-with-body -X POST -H "tppl-api-key: $APIKEY" "https://api.venafi.cloud/outagedetection/v1/certificatesearch?excludeSupersededInstances=true&ownershipTree=true" -H "Content-Type: application/json" \
    -d @- <<EOF | jq
{
  "expression": {
    "field": "subjectCN",
    "operator": "MATCH",
    "value": $(jq -R <<<"$commonname")
  },
  "ordering": {
    "orders": [
      { "direction": "DESC", "field": "certificatInstanceModificationDate" }
    ]
  },
  "paging": { "pageNumber": 0, "pageSize": 10 }
}
EOF
#!/bin/bash
# create.sh

set -o nounset -o errexit -o pipefail

if [ $# -ne 1 ]; then
  echo "Usage: $0 <common-name>"
  exit 1
fi

commonname=$1

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=$commonname"
kubectl create secret tls "$commonname" --cert=/tmp/tls.crt --key=/tmp/tls.key -o yaml --dry-run=client | kubectl apply -f -
#!/bin/bash
# rm.sh

set -o nounset -o errexit -o pipefail

if [ $# -ne 1 ]; then
  echo "Usage: $0 <common-name>"
  exit 1
fi
commonname=$1

json=$(
  curl -sS --fail-with-body -X POST -H "tppl-api-key: $APIKEY" "https://api.venafi.cloud/outagedetection/v1/certificatesearch?excludeSupersededInstances=true&ownershipTree=true" -H "Content-Type: application/json" -d @- <<EOF
{
  "expression": {
    "field": "subjectCN",
    "operator": "MATCH",
    "value": $(jq -R <<<"$commonname")
  },
  "ordering": {
    "orders": [
      { "direction": "DESC", "field": "certificatInstanceModificationDate" }
    ]
  },
  "paging": { "pageNumber": 0, "pageSize": 10 }
}
EOF
)

curl https://api.venafi.cloud/outagedetection/v1/certificates/retirement \
  -sS --fail-with-body -X POST -H "tppl-api-key: $APIKEY" -H "Content-Type: application/json" -d @- <<EOF
{"certificateIds": [$(jq '.certificates[].id' <<<"$json")]}
EOF

curl https://api.venafi.cloud/outagedetection/v1/certificates/deletion \
  -sS --fail-with-body -X POST -H "tppl-api-key: $APIKEY" -H "Content-Type: application/json" -d @- <<EOF
{"certificateIds": [$(jq '.certificates[].id' <<<"$json")]}
EOF

Now, run the end-to-end test:

./hack/e2e/test.sh

Once it completes, run:

sh create.sh foobar1234

Then, check that the certificate is present in Venafi Control Plane:

sh get.sh foobar1234

You should get something like this:

{
  "count": 1,
  "certificates": [
    {
      "id": "94dde4e0-a8e2-11ef-a111-3f5044185ce5",
      "companyId": "756db001-280e-11ee-84fb-991f3177e2d0",
      "managedCertificateId": "95a29470-a8e2-11ef-9ac6-8b82bce2245b",
      "fingerprint": "525473DDA4EF4711C2BDF3221EC4B2137CC96BDF",
      "certificateName": "foobar1234",
      "issuerCertificateIds": [],
      "certificateStatus": "ACTIVE",
      "modificationDate": "2024-11-22T15:00:59.820+00:00",
      "validityStart": "2024-11-22T15:00:24.000+00:00",
      "validityEnd": "2025-11-22T15:00:24.000+00:00",
      "selfSigned": true,
      "signatureAlgorithm": "SHA256_WITH_RSA_ENCRYPTION",
      "signatureHashAlgorithm": "SHA256",
      "encryptionType": "RSA",
      "keyStrength": 2048,
      "subjectKeyIdentifierHash": "67226B08E29BBFD0F78899BDE975503D1E3E57DB",
      "authorityKeyIdentifierHash": "67226B08E29BBFD0F78899BDE975503D1E3E57DB",
      "serialNumber": "3B4BDC4E4255C26A1FE6E5ED0A0E7910B6CA3598",
      "subjectDN": "cn=foobar1234",
      "subjectCN": [
        "foobar1234"
      ],
      "subjectAlternativeNamesByType": {
        "otherName": [],
        "rfc822Name": [],
        "dNSName": [],
        "x400Address": [],
        "directoryName": [],
        "ediPartyName": [],
        "uniformResourceIdentifier": [],
        "iPAddress": [],
        "registeredID": []
      },
      "issuerDN": "cn=foobar1234",
      "issuerCN": [
        "foobar1234"
      ],
      "ocspNoCheck": false,
      "versionType": "CURRENT",
      "totalInstanceCount": 1,
      "totalActiveInstanceCount": 0,
      "instances": [],
      "ownership": {}
    }
  ]
}

What was happening before?

Before this change, with v1.3.0, Venafi Kubernetes Agent used to not report secrets at all. It was successfully pushing, but the backend silently failed to process it.

$ mine/create.sh s1234
.+++++++++++++++++++++++++++++++++++++++*.........+..+......+.+.....+.+........+....+........+++++++++++++++++++++++++++++++++++++++*...+..+...+............+.......+......+..+............+.+.........+.........++++++
......+...+++++++++++++++++++++++++++++++++++++++*.+.+.....+......+...+.......+............+..+......+....+++++++++++++++++++++++++++++++++++++++*.....+......+..+.......+..+.........+.+........+.+............+...++++++
-----
secret/s1234 created
$ mine/get.sh s1234
{
  "count": 0,
  "certificates": []
}

@maelvls maelvls marked this pull request as draft November 22, 2024 15:18
@maelvls maelvls force-pushed the VC-37264-disable-compression branch from 57af677 to b2b753b Compare November 22, 2024 15:34
@maelvls maelvls marked this pull request as ready for review November 22, 2024 15:35
Comment on lines 593 to 592
if !flags.DisableCompression && res.AuthMode != VenafiCloudKeypair && res.AuthMode != VenafiCloudVenafiConnection {
errs = multierror.Append(errs, fmt.Errorf("--disable-compression=false can only be used with the %s and %s modes", VenafiCloudKeypair, VenafiCloudVenafiConnection))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I have no way of knowing if the flag has been passed or not, I propose that we only show this warning when --disable-compression=false is passed. It feels a bit off, and another option would have been to remove this validation, but I didn't want to change the code further than I already had.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced this with a log line that will show whenever the user uses --disable-compression.

@maelvls maelvls force-pushed the VC-37264-disable-compression branch from b2b753b to 9368c19 Compare November 22, 2024 16:25
We decided to entirely remove the logic behind --disable-compression
because the validation logic was looking super weird; we don't think
folks should be using --disable-compression=false anyways.

If we want to re-introduce a similar flag later on, we can create a new,
for example --no-compression.
@maelvls maelvls force-pushed the VC-37264-disable-compression branch from 9368c19 to 20d9b6f Compare November 22, 2024 16:29
@maelvls maelvls changed the title Disable compression to give us time to work on a fix VC-37264: Disable compression to give us time to work on a fix Nov 22, 2024
Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test it by modifying the E2E test in #629 and rebasing it on top of your branch.

It takes 3-4 minutes for the certificate to appear in the API.

$ make test-e2e-gke
...
{
  "ts": 1732294473390.6462,
  "caller": "agent/run.go:396",
  "msg": "Posting data",
  "v": 1,
  "logger": "Run.gatherAndOutputData.postData",
  "baseURL": ""
}
{
  "ts": 1732294473397.594,
  "caller": "cache/reflector.go:368",
  "msg": "Caches populated for *v1alpha1.VenafiConnection from k8s.io/[email protected]/tools/cache/reflector.go:243",
  "v": 2
}
{"ts":1732294473950.9778,"caller":"agent/run.go:409","msg":"Data sent successfully","v":0,"logger":"Run.gatherAndOutputData.postData"}
...+.........+........+...+.......+...+.....+.......+..+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..+...+............+....+.........+......+..+...+....+.....+.......+.....+....+..+...+...............+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.........+.+.....+.......+...........+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.......+..........+..+...+............+.......+.......................+.+...........+.+...+..+...+.+...............+..............+.......+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..........+........+...+....+......+.....+....+..+...............+...+..........+..+..........+.....+......+.............+..+....+........+.+..+...+..........+.........+......+......+......+..+........................................+...+...+......+.....+...+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
secret/venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c created
{"count":0,"certificates":[]}
{"count":0,"certificates":[]}
{"count":0,"certificates":[]}
{"count":0,"certificates":[]}
{"count":0,"certificates":[]}
{"count":0,"certificates":[]}
{
  "count": 1,
  "certificates": [
    {
      "id": "dfc5f820-a8f2-11ef-a949-796f7926985d",
      "companyId": "9a0cab61-2b00-11ee-ba09-0733b0fe5adc",
      "managedCertificateId": "e08354b0-a8f2-11ef-82d5-bb97de46047d",
      "fingerprint": "C8F617E52743F3B1BB762BEE8A43C8B9C3A68364",
      "certificateName": "venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c",
      "issuerCertificateIds": [],
      "certificateStatus": "ACTIVE",
      "modificationDate": "2024-11-22T16:57:37.390+00:00",
      "validityStart": "2024-11-22T16:54:38.000+00:00",
      "validityEnd": "2025-11-22T16:54:38.000+00:00",
      "selfSigned": true,
      "signatureAlgorithm": "SHA256_WITH_RSA_ENCRYPTION",
      "signatureHashAlgorithm": "SHA256",
      "encryptionType": "RSA",
      "keyStrength": 2048,
      "subjectKeyIdentifierHash": "76A333DE3A2A66700F7AEAB0B55A1422905D567C",
      "authorityKeyIdentifierHash": "76A333DE3A2A66700F7AEAB0B55A1422905D567C",
      "serialNumber": "0AEEAB504DD6E7576539BFE5C3BEFF453EC65CD4",
      "subjectDN": "cn=venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c",
      "subjectCN": [
        "venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c"
      ],
      "subjectAlternativeNamesByType": {
        "otherName": [],
        "rfc822Name": [],
        "dNSName": [],
        "x400Address": [],
        "directoryName": [],
        "ediPartyName": [],
        "uniformResourceIdentifier": [],
        "iPAddress": [],
        "registeredID": []
      },
      "issuerDN": "cn=venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c",
      "issuerCN": [
        "venafi-kubernetes-agent-e2e.d56671da-dbb2-4411-be66-d846d145f79c"
      ],
      "ocspNoCheck": false,
      "versionType": "CURRENT",
      "totalInstanceCount": 1,
      "totalActiveInstanceCount": 0,
      "instances": [],
      "ownership": {}
    }
  ]
}

It looks like you've basically reverted the original PR while keeping the new CLI flag (for backwards compatibility).

In that case, you could do a sanity check by comparing the diff of the original PR with this diff and make sure that (almost) everything has indeed been returned to its original state.

I have only tested the venaficonnection mode and I have only tested wit the EU Venafi Cloud tenant.

@@ -302,7 +302,7 @@ func InitAgentCmdFlags(c *cobra.Command, cfg *AgentCmdFlags) {
&cfg.DisableCompression,
"disable-compression",
false,
"Disables GZIP compression when uploading the data. Useful for debugging purposes or when an intermediate proxy doesn't like compressed data.",
"Deprecated. No longer has an effect.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional. Consider using the pflag.MarkDeprecated method instead, which will hide the flag from the --help and therefore from the documented CLI help.

Copy link
Member Author

@maelvls maelvls Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that would be even better.

I've tried this:

c.PersistentFlags().MarkDeprecated("disable-compression", "no longer has an effect")

But it uses logrus' logger, so it doesn't show as JSON (first line of the below snippet):

Flag --disable-compression has been deprecated, no longer has an effect
I1122 18:09:24.792410   25556 run.go:59] "Starting" logger="Run" version="development" commit=""
I1122 18:09:24.793470   25556 config.go:404] "Using the Venafi Cloud Key Pair Service Account auth mode since --client-id and --private-key-path were specified." logger="Run"
I1122 18:09:24.793480   25556 config.go:540] "Using period from config" logger="Run" period="5m0s"
I1122 18:09:24.793486   25556 config.go:592] "The flag --disable-compression has been deprecated an no longer has any effect." logger="Run"

Is that OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MarkHidden instead?

Copy link
Member Author

@maelvls maelvls Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to use MarkDeprecated as you initially suggested. It seemed more appropriate and I was able to remove all traces of the DisableCompression variable, so even better.

I don't think this "plain text" line is a big deal; no one will ever see it anyways, and I can fix it later if I can find a way to set up pflag's logger.

}
res.DisableCompression = flags.DisableCompression
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ go run ./ agent --one-shot --api-token should-not-be-required --install-namespace venafi --output-path /dev/null --agent-config-file examples/cert-manager-agent.yaml  --disable-compression
I1122 17:06:03.444467  284739 run.go:59] "Starting" logger="Run" version="development" commit=""
I1122 17:06:03.444747  284739 config.go:403] "Using the Jetstack Secure API Token auth mode since --api-token was specified." logger="Run"
I1122 17:06:03.444766  284739 config.go:421] "Using deprecated Endpoint configuration. User Server instead." logger="Run"
I1122 17:06:03.444773  284739 config.go:591] "The flag --disable-compression has been deprecated an no longer has any effect." logger="Run"
I1122 17:06:03.444805  284739 run.go:117] "Healthz endpoints enabled" logger="Run.APIServer" addr=":8081" path="/healthz"
I1122 17:06:03.444816  284739 run.go:121] "Readyz endpoints enabled" logger="Run.APIServer" addr=":8081" path="/readyz"
E1122 17:06:03.445471  284739 run.go:269] "Error messages will not show in the pod's events because the POD_NAME environment variable is empty" logger="Run"
I1122 17:06:03.753982  284739 run.go:322] "Data saved to local file" logger="Run.gatherAndOutputData" outputPath="/dev/null"

One small downside of using MarkDeprecated is that pflag doesn't use our
logr.Logger, so it is printed as pain text rather than the format
specified in --logging-format.
@maelvls maelvls merged commit 6a5e097 into master Nov 22, 2024
2 checks passed
@maelvls maelvls deleted the VC-37264-disable-compression branch November 22, 2024 17:31
@maelvls
Copy link
Member Author

maelvls commented Nov 22, 2024

I'll sanity check tha PR against the original PR on Monday morning before proceeding with the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants