Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context pooling is causing server panics #74

Closed
wants to merge 2 commits into from

Conversation

vektah
Copy link

@vektah vektah commented Aug 17, 2016

We deployed go 1.7 + chi 2 yesterday and noticed occasionally a machine would go out of service with:

panic: context: internal error: missing cancel error

goroutine 156 [running]:
panic(0x675540, 0xc42012f880)
    /opt/go-17/src/runtime/panic.go:500 +0x1a1
context.(*cancelCtx).cancel(0xc42001a600, 0x476900, 0x0, 0x0)
    /opt/go-17/src/context/context.go:339 +0x24b
context.(*timerCtx).cancel(0xc42001a600, 0x0, 0x0, 0x0)
    /opt/go-17/src/context/context.go:413 +0x4a
context.propagateCancel.func1(0x803660, 0xc42001a540, 0x800f20, 0xc42001a600)
    /opt/go-17/src/context/context.go:264 +0x128
created by context.propagateCancel
    /opt/go-17/src/context/context.go:267 +0x1dc

It appears chi is pooling custom context objects, but these objects can live longer than a request due to this snippet in context.go

func propagateCancel(parent Context, child canceler) {
    if parent.Done() == nil {
        return // parent is never canceled
    }
    if p, ok := parentCancelCtx(parent); ok {
        p.mu.Lock()
        if p.err != nil {
            // parent has already been canceled
            child.cancel(false, p.err)
        } else {
            if p.children == nil {
                p.children = make(map[canceler]bool)
            }
            p.children[child] = true
        }
        p.mu.Unlock()
    } else {
        go func() {   /// <<<<<<< This may live longer than a request, and the parent.Err() may have been reset between Done() and Err().
            select {
            case <-parent.Done():
                child.cancel(false, parent.Err())
            case <-child.Done():
            }
        }()
    }
}

This PR includes a failing test and the removal of the pooling.

It appears that this wouldn't be a problem if chi didn't create its own type of context object and just used Values, perhaps that would be a better approach? It would then fall into the parentCancelCtx part of propagateCancel and save a goroutine.

@pkieltyka
Copy link
Member

pkieltyka commented Aug 17, 2016

@vektah thanks for the report and test case! I'll give this a bit more thought to consider the best way approach to solve this in chi. If you have other ideas please let me know too.

@pkieltyka
Copy link
Member

I'll see if we can maintain RouteContext pooling, and instead storing it as a value in the context instead of as a context.Context type, like you suggested.

@lwc
Copy link

lwc commented Aug 17, 2016

The context API has been designed to discourage mutating contexts, which seems to contradict any kind of context reuse.

@pkieltyka
Copy link
Member

@lwc for sure, the current code doesn't reuse request context objects, it just reused the routing object, but clearly the way its adding itself to the context value chain is being problematic. I could get rid of the pool entirely, and I might.. I'm just trying a few things right now but I've been at a conference all day. Will get this in soon.

@pkieltyka
Copy link
Member

Let me know what you guys think of #75 -- this does the job, and it maintains pooling of the route context, which isn't really a context object, its just the routing details. The benefit is the benchmarks are better than removing the pool altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants