Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670

Closed
godber opened this issue Apr 28, 2021 · 14 comments
Closed
Assignees
Labels
k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice

Comments

@godber
Copy link
Member

godber commented Apr 28, 2021

It would be nice if k8s pods could be forcibly deleted as an option using the Teraslice job stop API.

Currently when stopping jobs it's like

POST /job/JOBID/_stop

perhaps an option like this:

POST /job/JOBID/_stop?force

I'd have to take a look at the k8s library I am using and see how I am using it and if force parameters can be added to these API calls, but the kubectl equivalant would be:

kubectl delete pods PODNAME --grace-period=0 --force

I'm not sure offhand how that works when deleting k8s jobs and deployments.

@godber godber added k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice labels Apr 28, 2021
@godber godber self-assigned this Apr 28, 2021
@godber godber changed the title Add force parameter to Teraslice Job stop API that force deletes things k8s - Add force parameter to Teraslice Job stop API that force deletes things Apr 28, 2021
@godber godber added this to the Minor k8s improvements milestone Aug 17, 2021
@godber
Copy link
Member Author

godber commented Nov 15, 2023

We had a discussion about this issue today, here are some of my thoughts from that discussion and after the fact:

  • Maybe we want to expose both force and grace_period via the Teraslice API, every attempt I've made to "simplify" things has resulted in complicating things.
  • Changing "restartPolicy": "Never" here might be sufficient to keep Kafka jobs going ... assuming workers no longer use IP addresses to reach the execution controller. We lose anything in the execution controller state of course and it probably shuts itself down for some reason, but trying this might inform us how to move forward to make these restartable.
  • ttlSecondsAfterFinished might be useful: https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically

@busma13
Copy link
Contributor

busma13 commented Nov 30, 2023

I've spent some time playing with the POST /job/JOBID/_stop api endpoint as well as using kubectl with different combinations of force and grace-period. Here are my thoughts so far:

  • POST /job/JOBID/_stop is equivalent to kubectl delete jobs, not kubectl delete pods. No combination of force and grace-period on a job will send a SIGKILL to worker and execution pods. The worker pods will shutdown gracefully or timeout after sysconfig.teraslice.shutdown_timeout milliseconds. In prod this time is set to 5 minutes.
  • Calling kubectl delete pods --grace-period=0 will send a SIGKILL to the pods specified, killing them instantly. But without shutting down the jobs or/and resources first, the pods may restart automatically.
  • Adding a force option to the POST /job/JOBID/_stop endpoint would require adding a second request to the jobs section of the k8s client delete function that would delete all pods associated with the job. This feels like the wrong approach, as we are fighting against the design principles of kubernetes. There may be possible race conditions where new worker pods are created to replace the killed pods if jobs/deployments are not yet deleted.

There are multiple scenarios that we are trying to fix:

  1. A job is in a terminal state and is not allowed to be stopped. We don't need to shut pods down immediately. This should coincide with the force option?
  2. A job is not in a terminal state, but we want the worker pods to shutdown immediately, without waiting sysconfig.teraslice.shutdown_timeout milliseconds. This should coincide with the grace-period option?
  3. A combination of 1 and 2: terminal state that we want to shut down immediately.

@busma13
Copy link
Contributor

busma13 commented Nov 30, 2023

I missed something earlier:

  • When using force and grace-period together grace period is ignored.
  • This command will achieve scenario 2:
    kubectl -n <teraslice-namespace> delete pods -l teraslice.terascope.io/jobName=<job-name> --grace-period=0
    • The teraslice.terascope.io/jobName label is on the ex controller job, ex controller pod, worker deployment, worker replica set, and all worker pods for a specified job.
    • Because all jobs and deployments are deleted in tandem with the worker and ex controller pods, we won't be fighting against Kubernetes.

@busma13
Copy link
Contributor

busma13 commented Dec 5, 2023

kubernetes/kubernetes#120449

According to this issue the behavior of grace period is different between v1.18 and v1.27 than before or after those versions:

"In fact, processes gets killed if we set --grace-period=0 in ver1.18.
But it seems that the behavior got changed between ver1.18 and ver1.27.

ver1.18 ... grace-period=0 kills Pod's processes immediately
ver1.27 ... grace-period=0 sends SIGTERM to Pod's processes first, then wait for Pod's terminationGracePeriodSeconds before sending SIGKILL. As a result, when I execute kubectl delete pod grace-period=0 --force , message "pod pod-name force deleted" is shown, but pod's process is not killed immediately. It lives during terminationGracePeriodSeconds .

"

The behavior was changed back in v1.28.

@busma13
Copy link
Contributor

busma13 commented Dec 5, 2023

When using kubectl setting the force flag will set gracePeriod to 0 in the curl request body. This is why you can't use gracePeriod and force together. If you set gracePeriod to 0 when using the k8s node API you are actually telling k8s to use force instead. gracePeriod must instead be set to 1.

@godber
Copy link
Member Author

godber commented Dec 7, 2023

I think I managed to accidentally recreate the scenario this issue was for. I think the scenario was:

  • I ran yarn k8s
  • ran the example job as described in my docs
  • left it running
  • closed my laptop
  • got home and opened my laptop
  • exc was dead

here's the state of the k8s ressources

kubectl -n ts-dev1 get all
NAME                                                       READY   STATUS      RESTARTS   AGE
pod/teraslice-master-84d4c87c7b-5shrv                      1/1     Running     0          65m
pod/ts-exc-data-generator-cacda274-6685-c56nr              0/1     Completed   0          67m
pod/ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc   1/1     Running     0          67m

NAME                       TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/teraslice-master   NodePort   10.96.42.251   <none>        5678:30678/TCP   106m

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/teraslice-master                      1/1     1            1           106m
deployment.apps/ts-wkr-data-generator-cacda274-6685   1/1     1            1           67m

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/teraslice-master-84d4c87c7b                      1         1         1       106m
replicaset.apps/ts-wkr-data-generator-cacda274-6685-79db457dd8   1         1         1       67m

NAME                                            COMPLETIONS   DURATION   AGE
job.batch/ts-exc-data-generator-cacda274-6685   1/1           48m        67m

Here's snippets of master logs

[2023-12-07T00:32:10.910Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.919Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.924Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: Creating an elasticsearch v6 client (assignment=cluster_master)
[2023-12-07T00:32:10.951Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution storage initialized (assignment=cluster_master, module=ex_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: job storage initialized (assignment=cluster_master, module=job_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: state storage initialized (assignment=cluster_master, module=state_storage, worker_id=C69wTsGx)
[2023-12-07T00:32:10.953Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: kubernetes clustering initializing (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.072Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution service is initializing... (assignment=cluster_master, module=execution_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.081Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution queue initialization complete (assignment=cluster_master, module=execution_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.082Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: job service is initializing... (assignment=cluster_master, module=jobs_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.082Z] DEBUG: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: services has been initialized (assignment=cluster_master, module=cluster_master, worker_id=C69wTsGx)
[2023-12-07T00:32:11.103Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: api service is initializing... (assignment=cluster_master, module=api_service, worker_id=C69wTsGx)
[2023-12-07T00:32:11.114Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: cluster master is ready! (assignment=cluster_master, module=cluster_master, worker_id=C69wTsGx)
[2023-12-07T00:32:12.127Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: execution fa92040e-b25e-464e-8b81-20edf38a99ed is connected (assignment=cluster_master, module=kubernetes_cluster_service, worker_id=C69wTsGx)
[2023-12-07T01:18:40.347Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'ping timeout' } (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)
[2023-12-07T01:18:43.392Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed reconnected (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)
[2023-12-07T01:18:48.584Z]  INFO: teraslice/17 on teraslice-master-84d4c87c7b-5shrv: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'transport close' } (assignment=cluster_master, module=messaging:server, worker_id=C69wTsGx)

snippet of ts-exc log

[2023-12-07T00:40:31.426Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 20224c93-28ec-4cb7-89cf-3ea4b624d868 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:32.883Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice 20224c93-28ec-4cb7-89cf-3ea4b624d868 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:32.939Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:34.793Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:34.800Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.848Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: worker 10.244.0.10__aav_tUjw has completed its slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.857Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: dispatched slice 2d85ba6b-1435-4093-bd59-e64236af8756 to worker 10.244.0.10__aav_tUjw (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:42:01.096Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client 10.244.0.10__aav_tUjw disconnected { reason: 'ping timeout' } (assignment=execution_controller, module=messaging:server, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.682Z] ERROR: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, err.code=INTERNAL_SERVER_ERROR)
    TSError: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected
        at ExecutionController._terminalError (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:271:23)
        at Timeout.<anonymous> (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:828:18)
        at listOnTimeout (node:internal/timers:569:17)
        at process.processTimers (node:internal/timers:512:7)
[2023-12-07T01:18:41.782Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed disconnected { reason: 'transport close' } (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.823Z] FATAL: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is ended because of slice failure (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.824Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: stopping scheduler... (assignment=execution_controller, module=execution_scheduler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.824Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is finished scheduling, 4 remaining slices in the queue (assignment=execution_controller, module=execution_scheduler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.826Z]  WARN: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: clients are all offline, but there are still 1 pending slices (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.827Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed did not finish (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.828Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [START] "elasticsearch_sender_api" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.828Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [START] "data_generator" operation shutdown (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.829Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [FINISH] "elasticsearch_sender_api" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.829Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: [FINISH] "data_generator" operation shutdown, took 1ms (assignment=execution_controller, module=slicer_context, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: calculating statistics (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

    operation data_generator
    average completion time of: 1723.25 ms, min: 1250 ms, and max: 10528 ms
    average size: 5000, min: 5000, and max: 5000
    average memory: 8009788.18, min: -43154208, and max: 67527704

[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: (assignment=execution_controller, module=slice_analytics, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

    operation elasticsearch_bulk
    average completion time of: 234.48 ms, min: 127 ms, and max: 759 ms
    average size: 5000, min: 5000, and max: 5000
    average memory: 785388.15, min: -79094600, and max: 15153176

[2023-12-07T01:18:41.830Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed has finished in 2887 seconds (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:41.974Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution fa92040e-b25e-464e-8b81-20edf38a99ed is done (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.077Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution shutdown was called for ex fa92040e-b25e-464e-8b81-20edf38a99ed (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.081Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: shutting down (assignment=execution_controller, module=state_storage, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:42.082Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: shutting down. (assignment=execution_controller, module=ex_storage, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.156Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed is reconnecting... (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.387Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed reconnected (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:43.595Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: client fa92040e-b25e-464e-8b81-20edf38a99ed connected (assignment=execution_controller, module=messaging:client, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:47.087Z]  WARN: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution controller fa92040e-b25e-464e-8b81-20edf38a99ed is shutdown (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:47.092Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown, already shutting down, remaining 25s (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:48.110Z] DEBUG: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: flushed logs successfully, will exit with code 0 (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:18:48.110Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown took 6s, exit with zero status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

snippet of worker log

[2023-12-07T00:40:34.790Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice 662e24a1-cdb4-4f35-871d-fdde15c14a94 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T00:40:36.844Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: analytics for slice: slice_id: "b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403", slicer_id: 0, slicer_order: 297, _created: "2023-12-07T00:40:29.588Z", time: [1866, 157], memory: [-6617520, 12812056], size: [5000, 5000] (assignment=worker, module=slice, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, slice_id=b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403)
[2023-12-07T00:40:36.844Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice b410ab7b-8dc4-4ce8-8a6f-94d8f9e22403 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:26.724Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw disconnected { reason: 'transport close' } (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:26.740Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: flushed 6 records to index ts-dev1__analytics* (assignment=worker, module=analytics_storage, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.118Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: analytics for slice: slice_id: "2d85ba6b-1435-4093-bd59-e64236af8756", slicer_id: 0, slicer_order: 298, _created: "2023-12-07T00:40:31.381Z", time: [2329758, 412], memory: [-28345424, 10532696], size: [5000, 5000] (assignment=worker, module=slice, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, slice_id=2d85ba6b-1435-4093-bd59-e64236af8756)
[2023-12-07T01:19:27.118Z]  INFO: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: slice 2d85ba6b-1435-4093-bd59-e64236af8756 completed (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.120Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:27.319Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:37.018Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: flushed 2 records to index ts-dev1__analytics* (assignment=worker, module=analytics_storage, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:19:49.759Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:20:14.761Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:20:39.764Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:04.765Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:29.766Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:21:54.770Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:22:19.772Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:22:44.778Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:09.781Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:34.780Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:23:59.782Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:12.125Z]  WARN: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: Timed out after 2m, waiting for message "worker:slice:complete" (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
    Error: Timed out after 2m, waiting for message "worker:slice:complete"
        at Client.handleSendResponse (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:43:19)
        at async /app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:243:17
        at async Timeout._onTimeout (/app/source/packages/utils/dist/src/promises.js:180:32)
[2023-12-07T01:24:12.125Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:24.782Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:24:49.784Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:25:14.787Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:25:39.788Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:04.790Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:29.792Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:26:54.798Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:27:19.795Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:27:44.796Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:09.797Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:34.800Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:57.129Z]  WARN: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: Timed out after 2m, waiting for message "worker:slice:complete" (assignment=worker, module=worker, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
    Error: Timed out after 2m, waiting for message "worker:slice:complete"
        at Client.handleSendResponse (/app/source/packages/teraslice-messaging/dist/src/messenger/core.js:43:19)
        at runNextTicks (node:internal/process/task_queues:60:5)
        at process.processTimers (node:internal/timers:509:9)
        at async /app/source/packages/teraslice/dist/src/lib/workers/worker/index.js:243:17
        at async Timeout._onTimeout (/app/source/packages/utils/dist/src/promises.js:180:32)
[2023-12-07T01:28:57.129Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: server is not ready and not-connected, waiting for the ready event (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:28:59.805Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)
[2023-12-07T01:29:24.806Z] DEBUG: teraslice/10 on ts-wkr-data-generator-cacda274-6685-79db457dd8-9sdvc: client 10.244.0.10__aav_tUjw is reconnecting... (assignment=worker, module=messaging:client, worker_id=aav_tUjw, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

@godber
Copy link
Member Author

godber commented Dec 7, 2023

I think this is the key clue:

[2023-12-07T01:18:41.682Z] ERROR: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected (assignment=execution_controller, module=execution_controller, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888, err.code=INTERNAL_SERVER_ERROR)
    TSError: slicer for ex fa92040e-b25e-464e-8b81-20edf38a99ed had an error, shutting down execution, caused by Error: All workers from workers from fa92040e-b25e-464e-8b81-20edf38a99ed have disconnected
        at ExecutionController._terminalError (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:271:23)
        at Timeout.<anonymous> (/app/source/packages/teraslice/dist/src/lib/workers/execution-controller/index.js:828:18)
        at listOnTimeout (node:internal/timers:569:17)
        at process.processTimers (node:internal/timers:512:7)

@godber
Copy link
Member Author

godber commented Dec 7, 2023

And the key to being able to reproduce this easily ourselves is this log line:

[2023-12-07T01:18:48.110Z]  INFO: teraslice/11 on ts-exc-data-generator-cacda274-6685-c56nr: execution_controller shutdown took 6s, exit with zero status code (assignment=execution_controller, module=execution_controller:shutdown_handler, worker_id=TQKG8Sw7, ex_id=fa92040e-b25e-464e-8b81-20edf38a99ed, job_id=cacda274-6685-4457-be88-706a15866888)

Specifically execution_controller shutdown took 6s, exit with zero status code. So if you process.exit(0) from inside the slicer, I think you can reproduce this problem. There may be a timing component to it, I'm not sure. So if the first thing the slicer does is exit, it may not work, you may have to "process some slices" first, then exit.

@busma13
Copy link
Contributor

busma13 commented Dec 7, 2023

I will definitely give this a try.

@godber
Copy link
Member Author

godber commented Dec 13, 2023

I was given a new way to try and reproduce one of these situations. With an elasticsearch reader, give it a query that is invalid. Here's a sample

TSError: slicer for ex XXXXXXXXXXXXX had an error, shutting down execution, caused by TSError: search_phase_execution_exception: [parse_exception] Reason: parse_exception: Encountered \" \")\" \")

I think an operation like this would work:

{
    "_op": "elasticsearch_reader",
    "connection": "es_foo",
    "index": "foo-2023.12.01",
    "query": "bar:( )",
    "date_field_name": "timestamp",
    "size": 10000
},

It's not clear if the job didnt ultimately clean it's workers up on its own.

@godber
Copy link
Member Author

godber commented Dec 15, 2023

This is completed and shown to work.

@godber godber closed this as completed Dec 15, 2023
@busma13 busma13 reopened this Dec 19, 2023
@godber
Copy link
Member Author

godber commented Dec 20, 2023

I think I have identified a repeatable failure scenario where force stop does not work.

  • create a failing job with the bad query shown above e.g. "query": "bar:( )",
  • let that job run to failure
  • verify with kubectl that the execution controller pod has Completed
  • Try and start the job again, this second execution will fail to start because the one before it is blocking it
  • NOW try and call force stop, you should see that it does not clean up the resources

I think the problem is that the second time the job is started it has a new exId, and when stop tries to delete stuff by exId it is of course using the "latest" one, rather than the previous one, which is the one that has failed.

I distinctly remember telling @busma13 to change the way he was doing things to target the exId rather than the jobId. I suspect this advice broke it.

This is pretty obvious if you pay close attention to master logs and see that the exId its trying to delete doesn't match the exId on the pods still running.

We should probably fix this force stop to delete by jobId, which is the obvious thing to do. But this does still point to an architectural error in how Teraslice and K8s have a state synchronization problem, very specifically a new ex can be created and associated to the job as shown here:

curl -Ss localhost:5678/jobs/<JOBID>/ex | jq -r .ex_id

While k8s resources matching a different exId are still present in k8s.

@godber
Copy link
Member Author

godber commented Dec 20, 2023

When addressing the issue above, please make the following additional changes:

  • provide an explicit example of force=true in the docs so there are no questions about the syntax
  • improve logging so stop and stop?force=true are different
  • the json returned to the user calling stop with curl should include some message that acknowledges the force parameter (e.g. it's not just indicated in the logs, but also to the end user).
  • Remove no longer needed deletion of worker deployment:
    // In the future we will remove the following block and just rely on k8s
    // garbage collection to remove the worker deployment when the execution
    // controller job is deleted. We leave this here for the transition
    // period when users may have teraslice jobs that don't yet have those
    // relationships.
    // So you may see warnings from the delete below failing. They may be
    // ignored.
    try {
    await this._deleteObjByExId(exId, 'worker', 'deployments');
    } catch (e) {
    this.logger.warn(`Ignoring the following error when deleting exId ${exId}: ${e}`);
    }

@busma13
Copy link
Contributor

busma13 commented Feb 21, 2024

This is done. We want to eventually pull force out into its own endpoint. See #3519

@busma13 busma13 closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice
Projects
None yet
Development

No branches or pull requests

3 participants