-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
along with other scaffolded files and types (i.e. controller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
don't we need stinking badges???
make target requires the use of `source' command
since this uses "/bin/sh" by default. if that path doesn't point to bash, then the scripting will error out because the "source" command is unavailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes / questions. I can't think of anything else I would include right now, but I'm sure we will find some things.
generating CRDs that support the latest and greatest
also sets fields with default values as "optional" since the v1 CRD spec requires this distinction. "omitempty" accomplishes the same thing but preference is lent to the former annotation since it clearly conveys our intention. update sample ray CR with a real field value; more will come
for use by all resources
and scaffolds structure for creating all the resources need to build a ray cluster. serviceaccount is implemented. tests will come
along with tests
these will be used to mount additional volumes into cluster pods
fixes default repo for ray image adds NodeSelector and Affinity fields
adds realistic ObjectStoreMemoryBytes min validation in accordance with the app's requirements. Points to OCIImageDefinition object
naive in the sense the controller creates them w/o check, but the generation of the deployment structs are well tested. also adds a head service and empty stub files for other required resources.
this is a bit cleaner, more descriptive
adds ClientServerPort to RayClusterSpec exposes proper ports in head service for ray changes default ray image to standard image (w/o extra ML libs) refactors ray resource lib funcs and adds more unit tests
creates network policy creates and binds pod security policy adds watches to controller for "owned" resources (still naive) changes ObjectStoreMemoryBytes type to account for nil
should be int64 so we can represent more than 2.2Gi fixes bug with cmd args passing value instead of pointer addr
while we can create a PSP for every cluster, the logistics behind managing both cluster and namespace scoped resources are gross. these changes make it possible for users to provide a psp "name". the psp will be verified and then use will be enabled via RBAC resources.
enables primitive autoscaling separates head/worker resource requests (different load profiles) adds printer columns for `kubectl get' output uses "recreate" strategy to manage head deployment updates example RayCluster YAML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing controversial 👍
exposes ScaleDownStabilizationWindowSeconds for HPA adds ImagePullSecrets to spec for pulling private images makes RayCluster.Spec field required
controller-runtime 0.7.0 was causing API submissions to error out whenever no default values were applied. this was fixed in version 0.7.1 kubernetes-sigs/controller-runtime#1329
reworks HPA creation behavior adds integration tests for webhook logic
we already default this field and validate use input. making this optional allows the user to specify other worker fields (e.g. resources) without having to explicitly provide a replica count as well.
@steved This is about 99% complete; I just need to add some more logging to the controller. I'd appreciate it if you would give this another look. |
version 0.8 allows us to excluded error stack traces in logs. this does 3 things: - produces less noise in production logs - forces us to add context to errors with wrapping - still allows us to trace errors in development when fixing bugs
removes "development mode" default
this helps us log type information when creating owned resources
minor changes moving from 1.19 to 1.20
logging create/update/delete of owned components debug logging for status updates and object patches debug++ logging for reconciliation loop triggers
...because
when updates to a stale object occur, we just requeue the key so it can be reprocessed with the latest version of the RayCluster object.
…ommon name (which represents image, chart, and deployment names)
0.8.3 removes deprecation warnings that were muddling the logs
change MINIKUBE_PROFILE to evaluate as $MINIKUBE_PROFILE adds extra vars with defaults to allow users override capabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, but I don't think any are a blocking issue.
if modified, ok := controlled.(*corev1.Service); ok { | ||
current := found.(*corev1.Service) | ||
modified.Spec.ClusterIP = current.Spec.ClusterIP | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ... seems a little odd and scary (i.e. feels like we should either have immutable services or somehow copy all the stateful-ish fields)
now this can be used by any reconciler or other types that require the need to propagate a contextual logger
this was only added to enrich log statements. instead, we can grab the type metadata directly from the scheme and add it to the logs prior to create/update operations.
indicates the reason why the shape of this operation is slightly different from the other reconcile functions.
using ifs to assert whether we need to take one action or another, or return an err
default image definition is added when completely missing. if a user provides an image, then we valid that the minimum required fields are present.
along with other scaffolded files and types (i.e. controller)