-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to store Gzip-encoded bundle data in ConfigMaps #685
Add an option to store Gzip-encoded bundle data in ConfigMaps #685
Conversation
Hi @zcahana. Thanks for your PR. I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
IMO This looks more or less exactly how I was expecting an implementation would look. I think we had some thoughts on whether it was better to only compress the CRDs + other raw kube manifests and leave the CSVs as is because we know that a large portion of the filesize of CSVs comes from already base64 encoded image metadata -- it would probably be interesting to see if we could run this against a few other operator bundles to see what kinds of reductions we get. But I imagine even just the naive implementation that compresses all of the contents will get very good results given the bulk of the size comes from the CRD structural schemas. |
Codecov Report
@@ Coverage Diff @@
## master #685 +/- ##
==========================================
+ Coverage 48.63% 49.11% +0.48%
==========================================
Files 95 96 +1
Lines 8124 8185 +61
==========================================
+ Hits 3951 4020 +69
+ Misses 3412 3388 -24
- Partials 761 777 +16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base64 isn't strictly required here, but results with some nicer looking/better debuggable ConfigMaps without garbled binary data, in the expense of a slightly larger storage size.
Can the ConfigMap binaryData
field be used to solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really impressive PR! Just have some nits and general observations.
I think this would definitely be valuable in OLM.
Thanks all for this thorough review. I'll work on the suggested changes as well as unit/e2e test. |
|
@zcahana As I understand it, |
Best of both worlds :-) |
@kevinrizza here's a quick look at the top-5 CSVs by size, as well as their compressed sizes: $ ll ./redis-operator/0.5.0/redis-operator.v0.5.0.clusterserviceversion.yaml*
-rw-r--r-- 1 zvic zvic 872918 Jun 21 23:15 ./redis-operator/0.5.0/redis-operator.v0.5.0.clusterserviceversion.yaml
-rw-r--r-- 1 zvic zvic 352004 Jun 21 22:37 ./redis-operator/0.5.0/redis-operator.v0.5.0.clusterserviceversion.yaml.gz
$ ll ./ember-csi-community-operator/0.9.4/ember-csi-community-operator.v0.9.4.clusterserviceversion.yaml*
-rw-r--r-- 1 zvic zvic 576990 Jun 21 23:15 ./ember-csi-community-operator/0.9.4/ember-csi-community-operator.v0.9.4.clusterserviceversion.yaml
-rw-r--r-- 1 zvic zvic 51783 Jun 21 22:37 ./ember-csi-community-operator/0.9.4/ember-csi-community-operator.v0.9.4.clusterserviceversion.yaml.gz
$ ll ./hedvig-operator/1.0.1/manifests/hedvig-operator.v1.0.1.clusterserviceversion.yaml*
-rwxr-xr-x 1 zvic zvic 254115 Jun 21 23:15 ./hedvig-operator/1.0.1/manifests/hedvig-operator.v1.0.1.clusterserviceversion.yaml*
-rwxr-xr-x 1 zvic zvic 182983 Jun 21 22:37 ./hedvig-operator/1.0.1/manifests/hedvig-operator.v1.0.1.clusterserviceversion.yaml.gz*
$ ll ./jaeger/1.20.0/jaeger.v1.20.0.clusterserviceversion.yaml*
-rw-r--r-- 1 zvic zvic 231809 Jun 21 23:15 ./jaeger/1.20.0/jaeger.v1.20.0.clusterserviceversion.yaml
-rw-r--r-- 1 zvic zvic 68659 Jun 21 22:37 ./jaeger/1.20.0/jaeger.v1.20.0.clusterserviceversion.yaml.gz
$ ll ./cockroachdb/2.0.9/manifests/cockroachdb.v2.0.9.clusterserviceversion.yaml*
-rw-r--r-- 1 zvic zvic 186843 Jun 21 23:15 ./cockroachdb/2.0.9/manifests/cockroachdb.v2.0.9.clusterserviceversion.yaml
-rw-r--r-- 1 zvic zvic 131128 Jun 21 22:37 ./cockroachdb/2.0.9/manifests/cockroachdb.v2.0.9.clusterserviceversion.yaml.gz As you see, the compression ratios vary substantially. Lower ratios indeed correspond with CSVs with base64 encoded data inline. Still, some CSVs benefit nicely from compression (particularly, |
82ad0fe
to
3e4a295
Compare
@ALL, I've worked through all review comments and added unit tests. |
Hi @zcahana, w.r.t. e2e testing, we think its best to have a specific operator-registry e2e test where we we spin up a cluster, run the unpack job, and check that the content made it to the configmap in the right format. There is an e2e test in operator-registry that runs in-cluster already, so it should be ok to use the existing infrastructure for the test. Although the OLM e2e suite is more robust, we don't need this test to integrate into the OLM APIs itself. The test in OLM is already there: do an install and make sure that the unpacking still works after adding the compression flag. This can be turned on in a subsequent PR (after vendoring in this change) and if CI is green then we can consider it working on the OLM side as well. |
@joelanford I've been playing with the base64 encoding/decoding back and forth and eventually realized we must base64-decode what we get back when reading the binaryData, regardless if we base64-encoded it when writing, or didn't. Bottom line, we're back to gzip+base64 encoding/decoding, but at least this time I've rewritten it with better memory efficiency. |
@exdx thanks for the tip about the e2e test. I've indeed extended the existing one to launch the extract job, and then read the configmap and verify we get the expected objects. The test runs twice: once for a small bundle, without applying compression, and once for a large (> 1MiB) bundle), with compression applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! I'm ready to lgtm once CI is green
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
Looks like there are some failures in the e2e test:
I haven't dived into those errors, but placing a hold on this PR in the meantime so the bot doesn't go crazy attempting to retest non-prow-based jobs. /hold |
The current failures are still due to a timeout (120 sec) in waiting for the extract job to complete. I can try to increase the timeout even further, but it's hard to believe the test is running that slow on the CI. |
@zcahana We have a couple of testing flakes that are difficult to reproduce locally but will pop up frequently while in CI - it's possible there's some sort of ginko parallelization magic going on that causes these flakes, but it's difficult to track down. It's possible that the testing environment that GH actions spins up for us has limited resources, which may help explain why more frequent tests are flaking due to exceeding the configured test timeout period, but I haven't dived too much into that angle either. Increasing the test timeout to find a good medium seems like a sufficient workaround in the short term though. |
I'm not sure I'm following the reasoning here, though its entirely possible I'm overlooking something. I checked out the PR and removed the I would have expected the |
@joelanford The unit test still passes since the fake client just returns whatever was written into the in-memory ConfigMap.binaryData as-is, encoded or not. In a real cluster (and indeed I discovered this only after starting to actually test the code using a real cluster), the ConfigMap binaryData field always returns base64-encoded when read from the api-server. This happens whether I base64-encoded the data put in binaryData (as I do now), or used the "raw" gzipped bytes (as I initially did) |
Oh wow, that is extremely surprising (to me at least). I wonder if this is documented somewhere... 🤔 EDIT: this is the best I could find: kubernetes/website#27066. Either way, I find it pretty surprising that the Also, this seems like a pretty obvious footgun of the fake client that I'm sure others have already run into. We may want to add a feature to the fake client to handle this peculiarity of the apiserver. Not sure if it will be accepted, but its worth a try, and at the very least it will serve as a breadcrumb for others who fall into this trap. |
Guys, can someone please kick off the CI? The two last commits increase the job timeout as well as dump job logs in case of timeout error. Let's see where we are with this. |
/rerun-all |
@zcahana Would you mind rebasing your PR against latest master? We recently fixed a kind test on CI so your PR won't pass CI until it is rebased. Thanks. |
Will do, thanks.
|
Signed-off-by: Zvi Cahana <[email protected]>
/lgtm |
@kevinrizza @joelanford @exdx @timflannagan @dinhxuanvu Thank you all for the thorough review and assistance throughout this PR. |
Description of the change:
This PR adds an option to compress/encode bundle data stored in a ConfigMap, so that large bundles have better chance fitting within the 1048576 bytes limit for ConfigMaps.
It's basically a stop-gap solution to operator-framework/operator-lifecycle-manager#1523, until the design proposed in operator-framework/enhancements#40 is finalized and implemented.
Some key aspects of this implementation:
-z
/--gzip
flag toopm alpha bundle extract
command.olm.contentEncoding: gzip+base64
annotation.--gzip
flag while invoking the bundle unpacker job.So far I've been manually testing this with the kubevirt/hyperconverged-cluster-operator bundle.
With compression enabled, this bundle takes 180KB, compared to 1004KB uncompressed (and 152KB compressed w/o base64 encoding).
I'll be extending this PR with unit+e2e tests, given the approach taken here is acceptable by the project maintainers.
Thanks!
Motivation for the change:
There's a growing number of operators who are hitting the 1048576 bytes barrier, including kubevirt/hyperconverged-cluster-operator, cert-manager, Strimzi, OCS, ...
Reviewer Checklist
/docs