Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote spark from contribution to application #2912

Open
7 tasks done
Tracked by #2763
juliusvonkohout opened this issue Nov 13, 2024 · 9 comments
Open
7 tasks done
Tracked by #2763

Promote spark from contribution to application #2912

juliusvonkohout opened this issue Nov 13, 2024 · 9 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Nov 13, 2024

Validation Checklist

  • Is this a Kubeflow issue?
  • Are you posting in the right repository ?
  • Did you follow the Kubeflow installation guideline ?
  • Is the issue report properly structured and detailed with version numbers?
  • Is this for Kubeflow development ?
  • Would you like to work on this issue?
  • You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Version

master

Describe your issue

@rimolive do you remember who wanted to help with the spark operator? I think the main problem is that

  1. TODO it does not work with istio "sidecar.istio.io/inject": "true" https://github.com/kubeflow/manifests/blob/73cbecfe604e84dfc7c0851630a6eb5733022dea/contrib/spark/sparkapplication_example.yaml#L24C7-L24C41

  2. TODO We are missing a securitycontext in https://github.com/kubeflow/manifests/blob/73cbecfe604e84dfc7c0851630a6eb5733022dea/contrib/spark/sparkapplication_example.yaml and it should be upstreamed to the spark repository

    securityContext:
      capabilities:
        drop:
          - ALL
      runAsGroup: 0
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      seccompProfile:
        type: RuntimeDefault
  1. TODO We need to add runAsGroup:0 or 185 in the securitycontext of https://github.com/kubeflow/manifests/blob/master/contrib/spark/spark-operator/base/kustomization.yaml and we need to upstream it. After update spark to 2.1.0 #2962 we only need the seccompProfile:
    type: RuntimeDefault

  2. TODO We need a synchronization script in /hack derived from the current makefile

  3. Done in update spark to 2.1.0 #2962 upgrade to the latest version

@juliusvonkohout
Copy link
Member Author

CC @GezimSejdiu

@juliusvonkohout
Copy link
Member Author

There is some security progress in https://github.com/kubeflow/spark-operator/releases/tag/v2.1.0

@jacobsalway
Copy link
Member

  1. Could you go into more detail on the issues you're seeing with Istio? I don't think Istio is needed for either the operator controller/webhook or for RPC/network shuffle within the Spark application. If Istio ingress gateways are being used for other components in Kubeflow, maybe there's a future enhancement to support this for the Spark UI as the current integration is intended for NGINX ingress.
  2. Do you mean the examples in our repository should have this securityContext?
  3. I'll raise a PR to upstream this but I think the only change needed is the seccompProfile. We also support readOnlyRootFilesystem: true.
  4. Is this a script for your or our repository?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Dec 16, 2024

  1. Could you go into more detail on the issues you're seeing with Istio? I don't think Istio is needed for either the operator controller/webhook or for RPC/network shuffle within the Spark application. If Istio ingress gateways are being used for other components in Kubeflow, maybe there's a future enhancement to support this for the Spark UI as the current integration is intended for NGINX ingress.

Istio support for the application in the user namespaces is the problem right now. By default the sparkcluster application must support the sidecars that are injected automatically and we must be on the servicemesh. Right now we are destroying the integration and security

"sidecar.istio.io/inject": "false"
since I did not have time to test it with proper Istio support.

2. Do you mean [the examples in our repository](https://github.com/kubeflow/spark-operator/tree/master/examples) should have this `securityContext`?

Yes kubernetes best practices is that the operator, webhook and sparkapplication have a PSS restricted securitycontext https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

3. I'll raise a PR to upstream this but I think the only change needed is the `seccompProfile`. We also support `readOnlyRootFilesystem: true`.

Yes readOnlyRootFilesystem: true can make sense for the operator and webhook

4. Is this a script for your or our repository?

Yes this is a script that we have to add here https://github.com/kubeflow/manifests/tree/master/hack analogously to the other scripts.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 20, 2025

@andreyvelich said, that we should include it in 1.10 even if it does not work with Istio.

I suggest to call it Beta support

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 20, 2025

#2962 already solves 1.5 of the 5 issues remaining

cc @biswassri @tarekabouzeid if you want to help creating the upstream PRs and the synchronization script in /hack.

so "

  1. TODO We are missing a securitycontext in https://github.com/kubeflow/manifests/blob/73cbecfe604e84dfc7c0851630a6eb5733022dea/contrib/spark/sparkapplication_example.yaml and it should be upstreamed to the spark repository
    securityContext:
      capabilities:
        drop:
          - ALL
      runAsGroup: 0
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      seccompProfile:
        type: RuntimeDefault
  1. TODO After update spark to 2.1.0 #2962 we only need to add the seccompProfile:
    type: RuntimeDefault to the upstream manifests in the spark-operator repository for both deployments (spark controller and webhook)

  2. TODO We need a synchronization script in /hack derived from the current makefile. just take the other scripts in /hack as reference
    "

@tarekabouzeid
Copy link
Member

Upstream sync PR : kubeflow/spark-operator#2397

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jan 22, 2025

@tarekabouzeid @biswassri @jacobsalway now that kubeflow/spark-operator#2397 is merged we should extend this to all examples as well:

TODO We are missing a securitycontext in https://github.com/kubeflow/manifests/blob/73cbecfe604e84dfc7c0851630a6eb5733022dea/contrib/spark/sparkapplication_example.yaml and it should be upstreamed to the spark repository

    securityContext:
      capabilities:
        drop:
          - ALL
      runAsGroup: 0 # that is best practices on Kubernetes to run as anyuserid!=0 and group 0 for file access
      runAsNonRoot: true
      allowPrivilegeEscalation: false
      seccompProfile:
        type: RuntimeDefault

@juliusvonkohout
Copy link
Member Author

#2966 is an example PR, but all examples in https://github.com/kubeflow/spark-operator/tree/master/examples must be changed as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants