k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

godber · 2021-04-28T16:27:08Z

It would be nice if k8s pods could be forcibly deleted as an option using the Teraslice job stop API.

Currently when stopping jobs it's like

POST /job/JOBID/_stop

perhaps an option like this:

POST /job/JOBID/_stop?force

I'd have to take a look at the k8s library I am using and see how I am using it and if force parameters can be added to these API calls, but the kubectl equivalant would be:

kubectl delete pods PODNAME --grace-period=0 --force

I'm not sure offhand how that works when deleting k8s jobs and deployments.

The text was updated successfully, but these errors were encountered:

godber · 2023-11-15T00:50:39Z

We had a discussion about this issue today, here are some of my thoughts from that discussion and after the fact:

Maybe we want to expose both force and grace_period via the Teraslice API, every attempt I've made to "simplify" things has resulted in complicating things.
Changing "restartPolicy": "Never" here might be sufficient to keep Kafka jobs going ... assuming workers no longer use IP addresses to reach the execution controller. We lose anything in the execution controller state of course and it probably shuts itself down for some reason, but trying this might inform us how to move forward to make these restartable.
ttlSecondsAfterFinished might be useful: https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically

busma13 · 2023-11-30T17:27:17Z

I've spent some time playing with the POST /job/JOBID/_stop api endpoint as well as using kubectl with different combinations of force and grace-period. Here are my thoughts so far:

POST /job/JOBID/_stop is equivalent to kubectl delete jobs, not kubectl delete pods. No combination of force and grace-period on a job will send a SIGKILL to worker and execution pods. The worker pods will shutdown gracefully or timeout after sysconfig.teraslice.shutdown_timeout milliseconds. In prod this time is set to 5 minutes.
Calling kubectl delete pods --grace-period=0 will send a SIGKILL to the pods specified, killing them instantly. But without shutting down the jobs or/and resources first, the pods may restart automatically.
Adding a force option to the POST /job/JOBID/_stop endpoint would require adding a second request to the jobs section of the k8s client delete function that would delete all pods associated with the job. This feels like the wrong approach, as we are fighting against the design principles of kubernetes. There may be possible race conditions where new worker pods are created to replace the killed pods if jobs/deployments are not yet deleted.

There are multiple scenarios that we are trying to fix:

A job is in a terminal state and is not allowed to be stopped. We don't need to shut pods down immediately. This should coincide with the force option?
A job is not in a terminal state, but we want the worker pods to shutdown immediately, without waiting sysconfig.teraslice.shutdown_timeout milliseconds. This should coincide with the grace-period option?
A combination of 1 and 2: terminal state that we want to shut down immediately.

busma13 · 2023-11-30T18:30:42Z

I missed something earlier:

When using force and grace-period together grace period is ignored.
This command will achieve scenario 2:
kubectl -n <teraslice-namespace> delete pods -l teraslice.terascope.io/jobName=<job-name> --grace-period=0
- The teraslice.terascope.io/jobName label is on the ex controller job, ex controller pod, worker deployment, worker replica set, and all worker pods for a specified job.
- Because all jobs and deployments are deleted in tandem with the worker and ex controller pods, we won't be fighting against Kubernetes.

busma13 · 2023-12-05T19:43:36Z

kubernetes/kubernetes#120449

According to this issue the behavior of grace period is different between v1.18 and v1.27 than before or after those versions:

"In fact, processes gets killed if we set --grace-period=0 in ver1.18.
But it seems that the behavior got changed between ver1.18 and ver1.27.

ver1.18 ... grace-period=0 kills Pod's processes immediately
ver1.27 ... grace-period=0 sends SIGTERM to Pod's processes first, then wait for Pod's terminationGracePeriodSeconds before sending SIGKILL. As a result, when I execute kubectl delete pod grace-period=0 --force , message "pod pod-name force deleted" is shown, but pod's process is not killed immediately. It lives during terminationGracePeriodSeconds .

"

The behavior was changed back in v1.28.

busma13 · 2023-12-05T22:27:18Z

When using kubectl setting the force flag will set gracePeriod to 0 in the curl request body. This is why you can't use gracePeriod and force together. If you set gracePeriod to 0 when using the k8s node API you are actually telling k8s to use force instead. gracePeriod must instead be set to 1.

godber · 2023-12-07T01:47:11Z

I think I managed to accidentally recreate the scenario this issue was for. I think the scenario was:

I ran yarn k8s
ran the example job as described in my docs
left it running
closed my laptop
got home and opened my laptop
exc was dead

here's the state of the k8s ressources

kubectl -n ts-dev1 get all
NAME                                                       READY   STATUS      RESTARTS   AGE
pod/teraslice-master-84d4c87c7b-5shrv                      1/1     Running     0          65m
pod/ts-exc-data-generator-cacda274-6685-c56nr              0/1     Completed   0          67m
pod/ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc   1/1     Running     0          67m

NAME                       TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/teraslice-master   NodePort   10.96.42.251   <none>        5678:30678/TCP   106m

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/teraslice-master                      1/1     1            1           106m
deployment.apps/ts-wkr-data-generator-cacda274-6685   1/1     1            1           67m

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/teraslice-master-84d4c87c7b                      1         1         1       106m
replicaset.apps/ts-wkr-data-generator-cacda274-6685-79db457dd8   1         1         1       67m

NAME                                            COMPLETIONS   DURATION   AGE
job.batch/ts-exc-data-generator-cacda274-6685   1/1           48m        67m

Here's snippets of master logs

[2023-12-07T00:32:10.910Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.919Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.924Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.951Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution storage initialized (assignment=cluster_master, module=ex_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: job storage initialized (assignment=cluster_master, module=job_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: state storage initialized (assignment=cluster_master, module=state_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: kubernetes clustering initializing (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.072Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution service is initializing... (assignment=cluster_master, module=execution_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.081Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution queue initialization complete (assignment=cluster_master, module=execution_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.082Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: job service is initializing... (assignment=cluster_master, module=jobs_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.082Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: services has been initialized (assignment=cluster_master, module=cluster_master, worker_id=C69wTsGx)
[2023-12-07T00:32:11.103Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: api service is initializing... (assignment=cluster_master, module=api_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.114Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: cluster master is ready! (assignment=cluster_master, module=cluster_master, worker_id=C69wTsGx)
[2023-12-07T00:32:12.127Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution fa92040e-b25e-464e-8b81-20edf38a99ed is connected (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=C69wTsGx)
[2023-12-07T01:18:40.347Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'ping timeout' } (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)
[2023-12-07T01:18:43.392Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed reconnected (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)
[2023-12-07T01:18:48.584Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'transport close' } (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)

snippet of ts-exc log

[2023-12-07T00:40:31.426Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 20224c93-28ec-4cb7-89cf-3ea4b624d868 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:32.883Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice 20224c93-28ec-4cb7-89cf-3ea4b624d868 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:32.939Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:34.793Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:34.800Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.848Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.857Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 2d85ba6b-1435-4093-bd59-e64236af8756 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:42:01.096Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client 10.244.0.10__aav_tUjw disconnected { reason: 'ping timeout' } (assignment=execution_controller, module=messaging:server, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.682Z] ERROR: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, err.code=INTERNAL_SERVER_ERROR)
    TSError: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected
        at ExecutionController._terminalError (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:271:23)
        at Timeout.<anonymous> (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:828:18)
        at listOnTimeout (node:internal/timers:569:17)
        at process.processTimers (node:internal/timers:512:7)
[2023-12-07T01:18:41.782Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'transport close' } (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.823Z] FATAL: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is ended because of slice failure (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.824Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: stopping scheduler... (assignment=execution_controller, module=execution_scheduler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.824Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is finished scheduling, 4 remaining slices in the queue (assignment=execution_controller, module=execution_scheduler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.826Z]  WARN: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: clients are all offline, but there are still 1 pending slices (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.827Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed did not finish (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.828Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [START] "elasticsearch_sender_api" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.828Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [START] "data_generator" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.829Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [FINISH] "elasticsearch_sender_api" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.829Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [FINISH] "data_generator" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: calculating statistics (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

    operation data_generator
    average completion time of: 1723.25 ms, min: 1250 ms, and max: 10528 ms
    average size: 5000, min: 5000, and max: 5000
    average memory: 8009788.18, min: -43154208, and max: 67527704

[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

    operation elasticsearch_bulk
    average completion time of: 234.48 ms, min: 127 ms, and max: 759 ms
    average size: 5000, min: 5000, and max: 5000
    average memory: 785388.15, min: -79094600, and max: 15153176

[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed has finished in 2887 seconds (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.974Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is done (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.077Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution shutdown was called for ex fa92040e-b25e-464e-8b81-20edf38a99ed (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.081Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: shutting down (assignment=execution_controller, module=state_storage, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.082Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.156Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed is reconnecting... (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.387Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed reconnected (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.595Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed connected (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:47.087Z]  WARN: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution controller fa92040e-b25e-464e-8b81-20edf38a99ed is shutdown (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:47.092Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown, already shutting down, remaining 25s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:48.110Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: flushed logs successfully, will exit with code 0 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:48.110Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown took 6s, exit with zero status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

snippet of worker log

[2023-12-07T00:40:34.790Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.844Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: analytics for slice: slice_id: "b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403", slicer_id: 0, slicer_order: 297, _created: "2023-12-07T00:40:29.588Z", time: [1866, 157], memory: [-6617520, 12812056], size: [5000, 5000] (assignment=worker, module=slice, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, slice_id=b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403)
[2023-12-07T00:40:36.844Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:26.724Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw disconnected { reason: 'transport close' } (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:26.740Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: flushed 6 records to index ts-dev1__analytics* (assignment=worker, module=analytics_storage, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.118Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: analytics for slice: slice_id: "2d85ba6b-1435-4093-bd59-e64236af8756", slicer_id: 0, slicer_order: 298, _created: "2023-12-07T00:40:31.381Z", time: [2329758, 412], memory: [-28345424, 10532696], size: [5000, 5000] (assignment=worker, module=slice, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, slice_id=2d85ba6b-1435-4093-bd59-e64236af8756)
[2023-12-07T01:19:27.118Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice 2d85ba6b-1435-4093-bd59-e64236af8756 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.120Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.319Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:37.018Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: flushed 2 records to index ts-dev1__analytics* (assignment=worker, module=analytics_storage, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:49.759Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:20:14.761Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:20:39.764Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:04.765Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:29.766Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:54.770Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:22:19.772Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:22:44.778Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:09.781Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:34.780Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:59.782Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:12.125Z]  WARN: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: Timed out after 2m, waiting for message "worker:slice:complete" (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
    Error: Timed out after 2m, waiting for message "worker:slice:complete"
        at Client.handleSendResponse (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:43:19)
        at async /app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:243:17
        at async Timeout._onTimeout (/app/source/packages/utils/dist/src/promises.js:180:32)
[2023-12-07T01:24:12.125Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:24.782Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:49.784Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:25:14.787Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:25:39.788Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:04.790Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:29.792Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:54.798Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:27:19.795Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:27:44.796Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:09.797Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:34.800Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:57.129Z]  WARN: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: Timed out after 2m, waiting for message "worker:slice:complete" (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
    Error: Timed out after 2m, waiting for message "worker:slice:complete"
        at Client.handleSendResponse (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:43:19)
        at runNextTicks (node:internal/process/task_queues:60:5)
        at process.processTimers (node:internal/timers:509:9)
        at async /app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:243:17
        at async Timeout._onTimeout (/app/source/packages/utils/dist/src/promises.js:180:32)
[2023-12-07T01:28:57.129Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:59.805Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:29:24.806Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

godber · 2023-12-07T01:48:09Z

I think this is the key clue:

[2023-12-07T01:18:41.682Z] ERROR: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, err.code=INTERNAL_SERVER_ERROR)
    TSError: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected
        at ExecutionController._terminalError (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:271:23)
        at Timeout.<anonymous> (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:828:18)
        at listOnTimeout (node:internal/timers:569:17)
        at process.processTimers (node:internal/timers:512:7)

godber · 2023-12-07T14:52:56Z

And the key to being able to reproduce this easily ourselves is this log line:

[2023-12-07T01:18:48.110Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown took 6s, exit with zero status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

Specifically execution_controller shutdown took 6s, exit with zero status code. So if you process.exit(0) from inside the slicer, I think you can reproduce this problem. There may be a timing component to it, I'm not sure. So if the first thing the slicer does is exit, it may not work, you may have to "process some slices" first, then exit.

busma13 · 2023-12-07T14:55:07Z

I will definitely give this a try.

godber · 2023-12-13T21:45:02Z

I was given a new way to try and reproduce one of these situations. With an elasticsearch reader, give it a query that is invalid. Here's a sample

TSError: slicer for ex XXXXXXXXXXXXX had an error, shutting down execution, caused by TSError: search_phase_execution_exception: [parse_exception] Reason: parse_exception: Encountered \" \")\" \")

I think an operation like this would work:

{
    "_op": "elasticsearch_reader",
    "connection": "es_foo",
    "index": "foo-2023.12.01",
    "query": "bar:( )",
    "date_field_name": "timestamp",
    "size": 10000
},

It's not clear if the job didnt ultimately clean it's workers up on its own.

godber · 2023-12-15T00:44:52Z

This is completed and shown to work.

godber · 2023-12-20T00:52:16Z

I think I have identified a repeatable failure scenario where force stop does not work.

create a failing job with the bad query shown above e.g. "query": "bar:( )",
let that job run to failure
verify with kubectl that the execution controller pod has Completed
Try and start the job again, this second execution will fail to start because the one before it is blocking it
NOW try and call force stop, you should see that it does not clean up the resources

I think the problem is that the second time the job is started it has a new exId, and when stop tries to delete stuff by exId it is of course using the "latest" one, rather than the previous one, which is the one that has failed.

I distinctly remember telling @busma13 to change the way he was doing things to target the exId rather than the jobId. I suspect this advice broke it.

This is pretty obvious if you pay close attention to master logs and see that the exId its trying to delete doesn't match the exId on the pods still running.

We should probably fix this force stop to delete by jobId, which is the obvious thing to do. But this does still point to an architectural error in how Teraslice and K8s have a state synchronization problem, very specifically a new ex can be created and associated to the job as shown here:

curl -Ss localhost:5678/jobs/<JOBID>/ex | jq -r .ex_id

While k8s resources matching a different exId are still present in k8s.

godber · 2023-12-20T00:56:59Z

When addressing the issue above, please make the following additional changes:

provide an explicit example of force=true in the docs so there are no questions about the syntax
improve logging so stop and stop?force=true are different
the json returned to the user calling stop with curl should include some message that acknowledges the force parameter (e.g. it's not just indicated in the logs, but also to the end user).

Remove no longer needed deletion of worker deployment:

teraslice/packages/teraslice/src/lib/cluster/services/cluster/backends/kubernetes/k8s.ts

Lines 377 to 388 in f02b3a4

    
           // In the future we will remove the following block and just rely on k8s 
        
           // garbage collection to remove the worker deployment when the execution 
        
           // controller job is deleted.  We leave this here for the transition 
        
           // period when users may have teraslice jobs that don't yet have those 
        
           // relationships. 
        
           // So you may see warnings from the delete below failing.  They may be 
        
           // ignored. 
        
           try { 
        
               await this._deleteObjByExId(exId, 'worker', 'deployments'); 
        
           } catch (e) { 
        
               this.logger.warn(`Ignoring the following error when deleting exId ${exId}: ${e}`); 
        
           }

busma13 · 2024-02-21T15:52:45Z

This is done. We want to eventually pull force out into its own endpoint. See #3519

godber added k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice labels Apr 28, 2021

godber self-assigned this Apr 28, 2021

godber changed the title ~~Add force parameter to Teraslice Job stop API that force deletes things~~ k8s - Add force parameter to Teraslice Job stop API that force deletes things Apr 28, 2021

godber added this to the Minor k8s improvements milestone Aug 17, 2021

godber assigned busma13 and sotojn Oct 6, 2023

busma13 mentioned this issue Dec 11, 2023

Add force option to POST /jobs/jobId/_stop and POST /ex/exId/_stop endpoints #3491

Merged

godber closed this as completed Dec 15, 2023

busma13 reopened this Dec 19, 2023

busma13 closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

godber commented Apr 28, 2021

godber commented Nov 15, 2023

busma13 commented Nov 30, 2023

busma13 commented Nov 30, 2023

busma13 commented Dec 5, 2023

busma13 commented Dec 5, 2023

godber commented Dec 7, 2023

godber commented Dec 7, 2023

godber commented Dec 7, 2023

busma13 commented Dec 7, 2023

godber commented Dec 13, 2023

godber commented Dec 15, 2023

godber commented Dec 20, 2023

godber commented Dec 20, 2023

busma13 commented Feb 21, 2024

k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

Comments

godber commented Apr 28, 2021

godber commented Nov 15, 2023

busma13 commented Nov 30, 2023

busma13 commented Nov 30, 2023

busma13 commented Dec 5, 2023

busma13 commented Dec 5, 2023

godber commented Dec 7, 2023

godber commented Dec 7, 2023

godber commented Dec 7, 2023

busma13 commented Dec 7, 2023

godber commented Dec 13, 2023

godber commented Dec 15, 2023

godber commented Dec 20, 2023

godber commented Dec 20, 2023

busma13 commented Feb 21, 2024