Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUTH-541: OIDC structured auth config #713

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

liouk
Copy link
Member

@liouk liouk commented Oct 8, 2024

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file and passed on to the KAS pods

KAS-o functionality PR: openshift/cluster-kube-apiserver-operator#1760

Enhancement: openshift/enhancements#1632

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 8, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 8, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR
  • serializes it into a configmap
  • syncs that configmap into openshift-config, where it will be picked up by the KAS-o and synced into a static file (not yet implemented)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and ibihim October 8, 2024 12:52
@liouk liouk changed the title AUTH-541: OIDC structured auth config WIP: AUTH-541: OIDC structured auth config Oct 8, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 35b2d3d to 7e8ad90 Compare October 8, 2024 13:31
@liouk liouk changed the title WIP: AUTH-541: OIDC structured auth config AUTH-541: OIDC structured auth config Oct 8, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 7e8ad90 to 31e7cc5 Compare October 10, 2024 13:13
@liouk liouk changed the title AUTH-541: OIDC structured auth config WIP: AUTH-541: OIDC structured auth config Oct 10, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch 5 times, most recently from f066dae to c4f822c Compare October 17, 2024 13:00
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch 2 times, most recently from 8ca7a87 to 36db406 Compare October 22, 2024 10:24
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file (not yet implemented) and passed on to the KAS pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the oidc-config-structured-auth branch 5 times, most recently from 81af1fc to 46663cd Compare October 28, 2024 09:06
@liouk liouk changed the title WIP: AUTH-541: OIDC structured auth config AUTH-541: OIDC structured auth config Oct 28, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 28, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file (not yet implemented) and passed on to the KAS pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the oidc-config-structured-auth branch from 3c2dad5 to 9e8b55d Compare November 18, 2024 15:08
defer resp.Body.Close()

body, err := io.ReadAll(resp.Body)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we only need to read the body, if the resp.StatusCode is NOT http.StatusOK, so I would only do it in the err case.

I never used it, but we might be even fine with http.MethodHEAD? Not sure, we could just use get and read the Body on err.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we only need to read the body, if the resp.StatusCode is NOT http.StatusOK, so I would only do it in the err case.

I wrote it like this because depending on the status code, sometimes the body can contain useful information about the respective code.

I never used it, but we might be even fine with http.MethodHEAD? Not sure, we could just use get and read the Body on err.

I chose GET because:

HEAD is risky in the sense that it might be blocked on some servers as it's not defined in the spec.

Copy link
Contributor

@ibihim ibihim Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably. We should keep GET.

But I don't understand why we should read from resp.Body into a slice of bytes, while not consuming it in most of the cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed; reading body only if we need to put it in an error message.

@liouk liouk force-pushed the oidc-config-structured-auth branch 5 times, most recently from 36820a6 to bca8702 Compare November 19, 2024 10:31
Copy link
Contributor

@ibihim ibihim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2024
Copy link
Contributor

openshift-ci bot commented Nov 19, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ibihim, liouk
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liouk
Copy link
Member Author

liouk commented Nov 19, 2024

/retest

@liouk liouk force-pushed the oidc-config-structured-auth branch from bca8702 to 5457a34 Compare November 19, 2024 17:09
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2024
Copy link
Contributor

openshift-ci bot commented Nov 19, 2024

New changes are detected. LGTM label has been removed.

@liouk liouk force-pushed the oidc-config-structured-auth branch from 5457a34 to a11dfa2 Compare November 19, 2024 17:55
}

if !featureGates.Enabled(features.FeatureGateExternalOIDC) {
return nil, nil, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to set up a child context with cancellation so that the feature gate accessor that is launched above can terminate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to keep the accessor running so that it can os.Exit if the feature gates change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I saw that nothing else was using it and did not make the connection that the comment at the top was saying that this all depends on the exit-on-feature-change behavior.

if err != nil {
return fmt.Errorf("could not marshal auth config into JSON: %v", err)
}
authConfigJSON := strings.TrimSpace(string(encoded))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the JSON serializer appends a trailing newline, but what problem was that causing?


var (
cfgScheme = runtime.NewScheme()
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The EnableStrict option isn't really doing anything for you since you are only building an encoder from this.

var (
cfgScheme = runtime.NewScheme()
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict)
serializerInfo, _ = runtime.SerializerInfoForMediaType(codecs.SupportedMediaTypes(), runtime.ContentTypeJSON)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Best to check the second return value and panic on false in case this should somehow break. That would be a clearer failure than a panic later inside of runtime.Encode.

return err
}

encoded, err := runtime.Encode(codecs.EncoderForVersion(serializerInfo.Serializer, apiserverv1beta1.ConfigSchemeGroupVersion), authConfig)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but I'm also unsure what benefit setting up and using a CodecFactory is providing. We're not converting or defaulting anything and we always want it to use JSON. May as well use https://pkg.go.dev/k8s.io/apimachinery/pkg/util/json#Marshal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this might be my fault. I mentioned that I am not sure if using the api machinery is better / more idiomatic than json.Marshal.

@liouk's original solution was based on json.Marshal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit was to serialize the data in a more structured way, e.g. without having to define the type meta manually. But it gets more complicated than what I thought, so I will revert back to using json.Marshal.

}

cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON})
if _, err := c.configMaps.ConfigMaps(managedNamespace).Apply(ctx, cm, metav1.ApplyOptions{FieldManager: c.name}); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is another field manager going to be writing to the same configmap? If not, it is probably reasonable to set Force: true to allow stomping conflicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the CAO must be the only one -- great point.

return fmt.Errorf("auth config validation failed: %v", errList)
}

cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been trying to reduce no-op Apply requests by extracting the current apply configuration from the local informer's cache with https://pkg.go.dev/k8s.io/client-go/applyconfigurations/core/v1#ExtractConfigMap and doing a apiequality.Semantic.DeepEqual between them first. I haven't heard of any issues arising from that approach yet, does it make sense to do it here to avoid making a write on every resync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the intention of this check: https://github.com/openshift/cluster-authentication-operator/pull/713/files#diff-3c99f304cc2949488aa2fa2b8aea2d7e8ddb0c8baa42e66f128eacd7cdbda11aR135-R137

	if existingCM != nil && existingCM.Data[authConfigDataKey] == authConfigJSON {
		return nil
	}

Do you think the DeepEqual() is preferable?

jwt.Issuer.CertificateAuthority = caData
}

switch provider.ClaimMappings.Username.PrefixPolicy {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a new valid option is added in the future? Can there be a period of time during an upgrade from N to N+1 where there is a cluster-authentication-operator at N and a CRD at N+1? I would include a default case to avoid all doubt.

Comment on lines 230 to 232
// TODO currently validations from k8s.io/apiserver/pkg/apis/apiserver/validation cannot be used here
// since they aren't defined for the beta type; once the feature goes out of beta, we should replace
// this func with the upstream validations (but keep CA cert validation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're defined for the internal type, would it be easier to convert to internal and use those? This seems to be how it is done for unserved APIs like the apiserver configuration files, I don't see any external-versioned validations there.

Also, what happens if there is a bug in the validation used by cluster-authentication-operator that causes the validation to be overly strict? Even if this changes to use the validations from k8s.io/apiserver, this can be out of sync with whatever a particular kube-apiserver was compiled against. This is a form of client-side validation, which we have been trying to move away from. Revisioned kube-apiserver rollouts should mitigate the risk of writing an invalid config here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I see the point in keeping validations at the server-side, especially with revisioned rollouts being in place.

However, the KAS pods do not really validate the CA cert, if specified. If the CA cert is not the correct one, the KAS pods will log an error but will not crash, so the rollout will be completed correctly. Therefore I'm considering keeping the CA cert validation at the CAO side, but dropping the rest. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fixup 20a1f72 that demonstrates what I described above. Placing a hold until this gets squashed or dropped.

/hold

@liouk liouk force-pushed the oidc-config-structured-auth branch from a11dfa2 to 20a1f72 Compare November 21, 2024 14:08
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 20a1f72 to 08155bf Compare November 21, 2024 14:16
@xingxingxia
Copy link
Contributor

From test result perspective, based on good pre-merge test results in https://issues.redhat.com/browse/OCPBUGS-44592?focusedId=26134688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26134688 , adding below label:
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 22, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file and passed on to the KAS pods

KAS-o functionality PR: openshift/cluster-kube-apiserver-operator#1760

Enhancement: openshift/enhancements#1632

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk
Copy link
Member Author

liouk commented Nov 22, 2024

/retest

Copy link
Contributor

openshift-ci bot commented Nov 22, 2024

@liouk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-operator-integration 08155bf link false /test test-operator-integration
ci/prow/e2e-aws-single-node 08155bf link false /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants