Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for limit-namespace in FlytePropeller #5238

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Apr 16, 2024

Tracking issue

fixes #5087

Why are the changes needed?

Refer to issue ^^ for in-depth explanation of problem.

What changes were proposed in this pull request?

Instead of back-pedaling from the informer to retrieve the object cache for listing k8s objects, this PR updates the plugin object monitoring mechanism to issue queries directly to the cache using the k8s object kind and version.

How was this patch tested?

locally with varying configuration.

Setup process

NA

Screenshots

NA

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

NA

Docs link

NA

@hamersaw hamersaw requested review from ByronHsu and pvditt April 16, 2024 14:09
Signed-off-by: Daniel Rammer <[email protected]>
@hamersaw hamersaw marked this pull request as ready for review April 18, 2024 14:43
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Apr 18, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 19, 2024
@wild-endeavor
Copy link
Contributor

thank you @hamersaw and @MortalHappiness

@wild-endeavor wild-endeavor merged commit e8588f3 into master Apr 19, 2024
46 checks passed
@wild-endeavor wild-endeavor deleted the bug/limit-namespace branch April 19, 2024 17:40
},
}
if err := r.cache.List(ctx, &list); err != nil {
logger.Warnf(ctx, "failed to list objects for %s.%s: %v", r.gvk.Kind, r.gvk.Version, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamersaw I'm seeing this log line a lot since we upgraded to 1.12 (45k times in 5 days). For instance:

failed to list objects for PyTorchJob.v1: no matches for kind "PyTorchJob" in version "v1"

The CRD does exist in the cluster:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  ...
  name: pytorchjobs.kubeflow.org
spec:
  group: kubeflow.org
  names:
    kind: PyTorchJob
    listKind: PyTorchJobList
    plural: pytorchjobs
    singular: pytorchjob
  scope: Namespaced
  versions:
  - ...
    name: v1

Tasks using the pytorch plugin also work.

Do you know what the reason could be for the cache not being aware of the pytorchjob crd?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flytepropeller error and crash when "limit-namespace" is set
4 participants