Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max duration timeout #12322

Merged
merged 12 commits into from
Dec 15, 2021
19 changes: 11 additions & 8 deletions cmd/queue/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,14 @@ var (
)

type config struct {
ContainerConcurrency int `split_words:"true" required:"true"`
QueueServingPort string `split_words:"true" required:"true"`
UserPort string `split_words:"true" required:"true"`
RevisionTimeoutSeconds int `split_words:"true" required:"true"`
ServingReadinessProbe string `split_words:"true"` // optional
EnableProfiling bool `split_words:"true"` // optional
EnableHTTP2AutoDetection bool `split_words:"true"` // optional
ContainerConcurrency int `split_words:"true" required:"true"`
QueueServingPort string `split_words:"true" required:"true"`
UserPort string `split_words:"true" required:"true"`
RevisionTimeoutSeconds int `split_words:"true" required:"true"`
RevisionMaxDurationTimeoutSeconds int `split_words:"true" required:"true"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about upgrades not having this env var, e.g. if the defaults ConfigMap with the new QP version updates before the new controller code is deployed? (I guess maybe not since the upgrade tests didn't fail 🤔)

Copy link
Contributor Author

@skonto skonto Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont set any defaults at the revision level to avoid such issues for the next release (will introduce defaults a release later). If the env var does not exist (contoller code is old) then the new QP code will pick zero as the value for the timeout and will ignore it later on (same as idle timeout).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this env var is set as required here, so won't envconfig panic on startup if this version of queue proxy rolls out before the corresponding code change that adds the env var?

Copy link
Contributor Author

@skonto skonto Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct my intention was to make it optional, good point copy paste left over. Will update in a sec.

ServingReadinessProbe string `split_words:"true"` // optional
EnableProfiling bool `split_words:"true"` // optional
EnableHTTP2AutoDetection bool `split_words:"true"` // optional

// Logging configuration
ServingLoggingConfig string `split_words:"true" required:"true"`
Expand Down Expand Up @@ -280,6 +281,8 @@ func buildServer(ctx context.Context, env config, drainer *pkghandler.Drainer, p
// hardcoded to always disable idle timeout for now, will expose this later
var idleTimeout time.Duration

maxDurationTimeout := time.Duration(env.RevisionMaxDurationTimeoutSeconds) * time.Second

// Create queue handler chain.
// Note: innermost handlers are specified first, ie. the last handler in the chain will be executed first.
var composedHandler http.Handler = httpProxy
Expand All @@ -303,7 +306,7 @@ func buildServer(ctx context.Context, env config, drainer *pkghandler.Drainer, p
}
composedHandler = queue.ProxyHandler(breaker, stats, tracingEnabled, composedHandler)
composedHandler = queue.ForwardedShimHandler(composedHandler)
composedHandler = handler.NewTimeoutHandler(composedHandler, "request timeout", firstByteTimeout, idleTimeout)
composedHandler = handler.NewTimeoutHandler(composedHandler, "request timeout", firstByteTimeout, idleTimeout, maxDurationTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@skonto skonto Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of this timeout handler is inspired by the http one as also stated here.
Since we are doing a custom handler for first byte anyway and there is another PR for idle time out coming it makes sense to have all the logic in one place. Not sure what the benefits will be with the net/http one? Also I see that there are some limitations with the Golang one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the benefits will be with the net/http one?

For me it's more about separation of concerns. It's easier to unit test the three separate handlers and the chain them together. This can be a follow up issue to refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


if metricsSupported {
composedHandler = requestMetricsHandler(logger, composedHandler, env)
Expand Down
4 changes: 4 additions & 0 deletions config/core/300-resources/configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,10 @@ spec:
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?'
type: string
maxDurationTimeoutSeconds:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop timeout - unsure whether prefixing request is helpful

Suggested change
maxDurationTimeoutSeconds:
requestMaxDurationSeconds:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think I do want to bike shed this actually (sorry!).. request feels to me like it implies a timeout specifically for the request part of the request/response, and iiuc the server's response time is also included. I'd drop "request".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will change.

description: maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed to stay open.
type: integer
format: int64
serviceAccountName:
description: 'ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/'
type: string
Expand Down
4 changes: 4 additions & 0 deletions config/core/300-resources/revision.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,10 @@ spec:
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?'
type: string
maxDurationTimeoutSeconds:
description: maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed to stay open.
type: integer
format: int64
serviceAccountName:
description: 'ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/'
type: string
Expand Down
4 changes: 4 additions & 0 deletions config/core/300-resources/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,10 @@ spec:
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names TODO: Add other useful fields. apiVersion, kind, uid?'
type: string
maxDurationTimeoutSeconds:
description: maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed to stay open.
type: integer
format: int64
serviceAccountName:
description: 'ServiceAccountName is the name of the ServiceAccount to use to run this pod. More info: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/'
type: string
Expand Down
39 changes: 39 additions & 0 deletions docs/serving-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -920,6 +920,19 @@ layer will wait for a request delivered to a container to begin replying
(send network traffic). If unspecified, a system default will be provided.</p>
</td>
</tr>
<tr>
<td>
<code>maxDurationTimeoutSeconds</code><br/>
<em>
int64
</em>
</td>
<td>
<em>(Optional)</em>
<p>maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed
to stay open.</p>
</td>
</tr>
</table>
</td>
</tr>
Expand Down Expand Up @@ -1382,6 +1395,19 @@ layer will wait for a request delivered to a container to begin replying
(send network traffic). If unspecified, a system default will be provided.</p>
</td>
</tr>
<tr>
<td>
<code>maxDurationTimeoutSeconds</code><br/>
<em>
int64
</em>
</td>
<td>
<em>(Optional)</em>
<p>maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed
to stay open.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="serving.knative.dev/v1.RevisionStatus">RevisionStatus
Expand Down Expand Up @@ -1582,6 +1608,19 @@ layer will wait for a request delivered to a container to begin replying
(send network traffic). If unspecified, a system default will be provided.</p>
</td>
</tr>
<tr>
<td>
<code>maxDurationTimeoutSeconds</code><br/>
<em>
int64
</em>
</td>
<td>
<em>(Optional)</em>
<p>maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed
to stay open.</p>
</td>
</tr>
</table>
</td>
</tr>
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/serving/v1/revision_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ type RevisionSpec struct {
// (send network traffic). If unspecified, a system default will be provided.
// +optional
TimeoutSeconds *int64 `json:"timeoutSeconds,omitempty"`

// maxDurationTimeoutSeconds is the maximum duration in seconds a request will be allowed
// to stay open.
// +optional
MaxDurationTimeoutSeconds *int64 `json:"maxDurationTimeoutSeconds,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an E2E test, I would rather not add the API fields. Maybe cut the API changes out of this and ship them seperately with the respective tests?

Copy link
Contributor Author

@skonto skonto Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I agree although I was planning to do it asap after merging this, I will add the test it is not a big of an addition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is resolved no? Given the conformance test was added?

}

const (
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/serving/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

63 changes: 49 additions & 14 deletions pkg/http/handler/timeout.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@ import (
)

type timeoutHandler struct {
handler http.Handler
firstByteTimeout time.Duration
idleTimeout time.Duration
body string
clock clock.Clock
handler http.Handler
firstByteTimeout time.Duration
idleTimeout time.Duration
maxDurationTimeout time.Duration
body string
clock clock.Clock
}

// NewTimeoutHandler returns a Handler that runs `h` with the
Expand All @@ -53,13 +54,14 @@ type timeoutHandler struct {
// https://golang.org/pkg/net/http/#Handler.
//
// The implementation is largely inspired by http.TimeoutHandler.
func NewTimeoutHandler(h http.Handler, msg string, firstByteTimeout time.Duration, idleTimeout time.Duration) http.Handler {
func NewTimeoutHandler(h http.Handler, msg string, firstByteTimeout time.Duration, idleTimeout time.Duration, maxDurationTimeout time.Duration) http.Handler {
return &timeoutHandler{
handler: h,
body: msg,
firstByteTimeout: firstByteTimeout,
idleTimeout: idleTimeout,
clock: clock.RealClock{},
handler: h,
body: msg,
firstByteTimeout: firstByteTimeout,
idleTimeout: idleTimeout,
maxDurationTimeout: maxDurationTimeout,
clock: clock.RealClock{},
}
}

Expand Down Expand Up @@ -90,13 +92,32 @@ func (h *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// the panic from h.handler.ServeHTTP if h.handler.ServeHTTP panics.
done := make(chan interface{})
tw := &timeoutWriter{w: w, clock: h.clock}

// make sure that when max duration time out expires
// curTime - requestTime >= timeout
tw.requestStartTime = tw.clock.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably reading the PR wrong, but I can't figure out where this field is used / what it does?

Copy link
Contributor Author

@skonto skonto Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove I guess now that I dont calculate any time diff, it is a relic of my previous commit.


var maxDurationTimeout clock.Timer
var maxDurationTimeoutDrained bool
if h.maxDurationTimeout > 0 {
maxDurationTimeout = getTimer(h.clock, h.maxDurationTimeout)
defer func() {
putTimer(maxDurationTimeout, maxDurationTimeoutDrained)
}()
}
var maxDurationTimeoutCh <-chan time.Time
if maxDurationTimeout != nil {
maxDurationTimeoutCh = maxDurationTimeout.C()
}

go func() {
defer func() {
defer close(done)
if p := recover(); p != nil {
done <- p
}
}()

h.handler.ServeHTTP(tw, r.WithContext(ctx))
}()

Expand All @@ -119,6 +140,12 @@ func (h *timeoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return
}
idleTimeout.Reset(timeToNextTimeout)
case <-maxDurationTimeoutCh:
timedOut := tw.maxDurationTimeoutAndWriteError(h.body)
if timedOut {
maxDurationTimeoutDrained = true
return
}
}
}
}
Expand All @@ -134,9 +161,10 @@ type timeoutWriter struct {
w http.ResponseWriter
clock clock.PassiveClock

mu sync.Mutex
timedOut bool
lastWriteTime time.Time
mu sync.Mutex
timedOut bool
lastWriteTime time.Time
requestStartTime time.Time
}

var _ http.Flusher = (*timeoutWriter)(nil)
Expand Down Expand Up @@ -222,6 +250,13 @@ func (tw *timeoutWriter) tryIdleTimeoutAndWriteError(curTime time.Time, idleTime
return false, idleTimeout - timeSinceLastWrite
}

func (tw *timeoutWriter) maxDurationTimeoutAndWriteError(msg string) bool {
tw.mu.Lock()
defer tw.mu.Unlock()
tw.timeoutAndWriteError(msg)
return true
}

func (tw *timeoutWriter) timeoutAndWriteError(msg string) {
tw.w.WriteHeader(http.StatusGatewayTimeout)
io.WriteString(tw.w, msg)
Expand Down
Loading