How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

sebastianjohnk · 2024-10-08T12:42:27Z

Ask your question here:

Hi. I'm working on a small POC to create some knative Services.
The image I'm providing for the pod currently contains a small flask app that listens on a port.
Right now I'm testing out the scale-down-ability of these Services.
It seems that once Knative decides to scale down the number of replicas of a pod from, say 3 to 2, or even to 0, these pods remain in a "Terminating" state for a long time, close to 4 or 5 minutes I'd say.
And they seem to be in a 1/2 state.
I checked the container logs. It seems the queue-proxy container is shutting down properly, but not my flask app container.

But anyway, I learned that every pod has a terminationGracePeriodSeconds value that decides how long a pod can stay in this "Terminating" stage before kubernetes force kills it.

Now here is the problem. The terminationGracePeriodSeconds seems to be a default value of 300 for all pods spawned as part of a Service, with seemingly no option to specify it in the Service yaml spec.

I'm able to specify this in a Pod yaml spec and deploy that pod individually and it gets reflected in the pod (when I fetch the pod yaml using kubectl).

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: my-container
      image: nginx
      ports:
        - containerPort: 80
  terminationGracePeriodSeconds: 120

But when I try to deploy a Service using a yaml spec, which in turn contains a pod spec with the same configuration, something like this --

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello-world
  namespace: default
  labels:
    app: hello-world
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1" # Minimum number of Pods
        autoscaling.knative.dev/maxScale: "5" # Maximum number of Pods
    spec:
      terminationGracePeriodSeconds: 99 # Custom termination grace period
      containers:
        - image: gcr.io/knative-samples/helloworld
          ports:
            - containerPort: 8080

I get an error saying
Error from server (BadRequest): error when creating "helloservice.yaml": Service in version "v1" cannot be handled as a Service: strict decoding error: unknown field "spec.template.spec.terminationGracePeriodSeconds"

But if I remove this field from the Service yaml, the service gets deployed, and the pod seems to have a default terminationGracePeriodSeconds value of 300.

I also checked what the default value for a pod that is directly deployed from a pod yaml spec is (without the terminationGracePeriodSeconds specified ), to see if it a kubernetes default thing, but it seems to be 30.
So the default seems to be 30 for individual pods and 300 for pods that are part of a Service.

I guess my question is, how is this default terminationGracePeriodSeconds value of 300 being set for pods belonging to Services and is there any way I can change this either by mentioning it in my Service yaml spec, or by changing some kubernetes/knative configuration ?

Any help would be much appreciated thank you.

The text was updated successfully, but these errors were encountered:

sebastianjohnk · 2024-10-09T05:21:52Z

Update

It looks like the terminationGracePeriodSeconds value is being directly picked from the revision-timeout-seconds value specified in the config-defaults configmap in the knative namespace.

Is there any way I can have different value for these two ?
Because I don't want my pod to be stuck in Terminating state for more than 25 seconds.
But I might still have requests coming in to my pod that take longer than 25 seconds at which point I don't want a timeout error happening.

skonto · 2024-10-09T11:05:33Z

Hi @sebastianjohnk, Knative manages the pod termination cycle as it:
a) sets a preStop hook for draining connections and manage inflight requests. This hook will query a queue proxy endpoint to check if drainer has finished. The drainer (run by queue proxy) has a waiting period of 30secs before it returns assuming no new requests have arrived. Any new request resets the timer.
b) sets the terminationGracePeriodSeconds=rev.Spec.TimeoutSeconds so that requests have enough time to finish and be treated equally as any other request. If you don't set that field in the ksvc spec, the value is set from the defaults cm.

github-actions · 2025-01-08T01:28:37Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

skonto · 2025-01-27T12:49:22Z

/remove-lifecycle stale

kahirokunn · 2025-01-27T13:16:50Z

Currently, Knative uses the same timeout setting for both service startup and shutdown. However, we would like to have separate timeouts for these operations.
The primary motivation is that the startup process might take several minutes when it triggers a node autoscaler, while the shutdown process often does not require such a long wait. Separating startup and shutdown timeouts would provide more flexibility in resource management, resulting in a better developer and operator experience.

dprotaso · 2025-01-27T18:22:38Z

However, we would like to have separate timeouts for these operations.

By having separate timeouts you'll have incoming requests that are timed out earlier than RevisionSpec.TimeoutSeconds

For example - imagine a request is received by a Pod right before the pod receives a SIGTERM. That's why terminationGracePeriod is set to TimeoutSeconds. In this time the queue-proxy lame ducks (fails readiness) so that we don't receive any new requests to the incoming pod.

When the request is finished the pod should shutdown before the grace period ends. @sebastianjohnk @kahirokunn are your requests long running (eg. websockets, server-side events) ?

skonto · 2025-01-27T18:26:15Z

From what I have seen (also downstream) this is usually due to AI use cases where you want to stop a model from finishing (these requests can be long) a request and shut it down quickly. cc @Jooho

sebastianjohnk added the kind/question Further information is requested label Oct 8, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2025

skonto mentioned this issue Jan 27, 2025

Feature Request: Separate Startup and Shutdown Timeouts for Knative #15730

Closed

knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

sebastianjohnk commented Oct 8, 2024

sebastianjohnk commented Oct 9, 2024

skonto commented Oct 9, 2024 •

edited

Loading

github-actions bot commented Jan 8, 2025

skonto commented Jan 27, 2025

kahirokunn commented Jan 27, 2025

dprotaso commented Jan 27, 2025

skonto commented Jan 27, 2025 •

edited

Loading

How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

How to set custom terminationGracePeriodSeconds for a knative Service pod #15555

Comments

sebastianjohnk commented Oct 8, 2024

Ask your question here:

sebastianjohnk commented Oct 9, 2024

Update

skonto commented Oct 9, 2024 • edited Loading

github-actions bot commented Jan 8, 2025

skonto commented Jan 27, 2025

kahirokunn commented Jan 27, 2025

dprotaso commented Jan 27, 2025

skonto commented Jan 27, 2025 • edited Loading

skonto commented Oct 9, 2024 •

edited

Loading

skonto commented Jan 27, 2025 •

edited

Loading