Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force stop command deletes all jobs if the kubernetes job no longer exists #3750

Closed
busma13 opened this issue Sep 12, 2024 · 6 comments
Closed
Assignees
Labels
bug k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice

Comments

@busma13
Copy link
Contributor

busma13 commented Sep 12, 2024

When force deleting a job, there is a possibility that the kubernetes job resource has been deleted, while there are other resources (pods, services, deployments) that have been orphaned. In this case K8s._deleteObjByExId() will receive undefined as the name of the job resource. The K8s.delete() function uses name to select which jobs to delete. If passed undefined or an empty string it will return ALL job resources and then delete them.

K8s.delete() needs to ensure that name is defined and not an empty string.
K8s._deleteObjByExId() must ensure there is a resource to delete before calling delete().

This bug is only in kubernetes clustering, as kubernetesV2 receives a typed object when looking up the job resource, and therefore already checks whether name is defined.

@busma13 busma13 added bug k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice labels Sep 12, 2024
@busma13 busma13 self-assigned this Sep 12, 2024
@godber
Copy link
Member

godber commented Sep 14, 2024

I was just thinking about this and I am pretty sure the steps to reproduce are easily achieved:

  1. start two jobs, job1 with a long delay and job2 that will be accidentally stopped
  2. call /_stop on job1
  3. immediately call /_stop?force on job1
  4. check state of jobs, job1 and job2 should be gone

The first call to stop deletes the all of the resources related to that exId, but leaves the pods in terminating due to their grace period. Then the second call to _stop uses force, has no Kubernetes Job for the execution controller, leading to the undefined being sent to the k8s API delete calls.

@busma13
Copy link
Contributor Author

busma13 commented Sep 17, 2024

While resolving this issue I ran into some other problems:

  • The force option was only deleting jobs and pods. It now deletes services, deployments, and replicasets. This will let us force delete any of these resources if K8s has a strange failure that leaves them up.
  • Force stopping after a regular stop is not possible in K8s. K8s marks resources for deletion, adds the metadata.deletionGracePeriodSeconds and metadata.deletionTimestamp fields, and will not send a SIGKILL until the grace period has concluded. metadata.deletionGracePeriodSeconds cannot be modified or removed, so a second call to delete with gracePeriodSeconds = 0 (equivalent of kubectl delete --force) will not be able to update the grace period and a SIGKILL will not be sent immediately. For this reason the order that we force delete resources is important. We must delete child resources before parents or they will be deleted in the background with the normal grace period. It may also be possible to change the propogationPolicy from background to orphan or foreground, but that would require other changes.

@busma13
Copy link
Contributor Author

busma13 commented Sep 18, 2024

Useful links:

@busma13
Copy link
Contributor Author

busma13 commented Sep 19, 2024

In certain scenarios calling force delete on a pod will not actually delete the process immediately.
Steps to reproduce:

  • Ensure teraslice.shutdown_timeout in the config is sufficiently long to see the problem. 5 minutes should work.
  • start a teraslice job similar to this:
  {
    "name": "peter-datagen-to-delay-to-kafka",
    "lifecycle": "persistent",
    "workers": 3,
    "assets": [
        "standard",
        "kafka"
    ],
    "operations": [
        {
            "_op": "data_generator",
            "size": 5000
        },
        {
            "_op": "delay",
            "ms": 90000
        },
        {
            "_op": "kafka_sender",
            "connection": "default",
            "topic": "rand-p-json-v1",
            "size": 5000
        }
    ]
  }

Delay is used here to keep the slice from finishing, preventing the shutdown process from completing until the delay ends.

  • in a separate terminal watch the logs of a pod: kubectl -n <namespace> logs <pod-name> -f | bunyan
  • call stop on the job: curl <teraslice-client>:<port>/jobs/<job-id>/_stop
    • the worker pods will receive a SIGTERM and be placed in the terminating status.
  • force delete the pod you are watching: kubectl -n <namespace> delete <pod-name> --force --grace-period=0
    • `kubectl get pods -n will no longer show the pod
    • the logs will still be coming in, showing that the worker process is still shutting down
    • once the shutdown timeout is reached the process will terminate.

godber pushed a commit that referenced this issue Sep 19, 2024
This PR makes the following changes:

- Ensure that `K8s._deleteObjByExId()` contains an object in
`objList.items[]` before calling `K8s.deleted()`.
- Throw if `K8s.delete()` receives a `name` argument that is undefined
or an empty string.
- Refactor `K8s._deleteObjByExId()` to delete multiple objects and not
handle `force logic`
- Refactor `k8s.deleteExecution() to handle `force` logic by calling
`K8s._deleteObjByExId()` on multiple resources.
- bump teraslice from version 2.3.1 to 2.3.2
- pin the kind image used by k8s to:
`kindest/node:v1.28.12@sha256:fa0e48b1e83bb8688a5724aa7eebffbd6337abd7909ad089a2700bf08c30c6ea`.
Note that the version in the image tag refers to the kubernetes server
version that will be used. We should keep this in sync with prod.

ref: #3750
@busma13
Copy link
Contributor Author

busma13 commented Sep 19, 2024

Here is a simpler reproduction:

  • dockerfile:
FROM alpine:3.18

COPY ./shutdown-script.sh /usr/local/bin/shutdown-script.sh

RUN chmod +x /usr/local/bin/shutdown-script.sh

ENTRYPOINT ["/usr/local/bin/shutdown-script.sh"]
  • shutdown-script.sh:
#!/bin/sh

# Trap the SIGTERM signal
trap 'echo "Received SIGTERM. Delaying shutdown..."; for i in $(seq 5 1); do echo "Waiting to shutdown... $i minute(s)"; sleep 60; done; exit 0' SIGTERM

# Keep the container running indefinitely
while :; do
  echo "Running..."
  sleep 10
done
  • deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slow-shutdown
  labels:
      app.kubernetes.io/name: slow-shutdown
      app.kubernetes.io/component: master
spec:
  replicas: 1
  selector:
      matchLabels:
          app.kubernetes.io/name: slow-shutdown
          app.kubernetes.io/component: master
  template:
    metadata:
      labels:
        app.kubernetes.io/name: slow-shutdown
        app.kubernetes.io/component: master
    spec:
      containers:
      - name: slow-shutdown
        image: slow_shutdown:1.0.0
      terminationGracePeriodSeconds: 30000
  • create a docker image from the dockerfile:
    docker build -t slow_shutdown:1.0.0 .
  • make sure the image is accessible to k8s. If using kind:
    `kind load docker-image slow_shutdown:1.0.0 --name
  • create deployment:
    kubectl create -n <namespace> -f deployment.yaml
  • watch logs: kubectl logs -n default slow-shutdown-867b6886dd-jt6tn -f | bunyan
Running...
Running...
Running...
Running...
  • delete deployment:
    kubectl delete -n default deployment.apps/slow-shutdown
    logs:
Running...
Running...
Running...
Running...
Received SIGTERM. Delaying shutdown...
Waiting to shutdown... 5 minute(s)
  • force delete pod:
    kubectl delete -n default pod/slow-shutdown-867b6886dd-jt6tn --grace-period=0 --force=true
  • You should not be able to see the pod with kubectl get pods -n default but the logs should stay open for 5 minutes from the original delete
Waiting to shutdown... 5 minute(s)
Waiting to shutdown... 4 minute(s)
Waiting to shutdown... 3 minute(s)
Waiting to shutdown... 2 minute(s)
Waiting to shutdown... 1 minute(s)

@busma13
Copy link
Contributor Author

busma13 commented Oct 7, 2024

This should be resolved. Closing.

@busma13 busma13 closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug k8s Applies to Teraslice in kubernetes cluster mode only. pkg/teraslice
Projects
None yet
Development

No branches or pull requests

2 participants