-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nomad exec part 1: plumbing and docker driver #5632
Conversation
This helper returns the token as well as the ACL policy, to be used in a later commit for logging the token info associated with nomad exec invocation.
Test helper that allows registration of jobs when ACL is activated.
Extract command parsing and execution mocking into a separate struct. Also, allow mocking of different fs_isolation for testing.
This adds `alloc-exec` capability to allow operator to execute command into a running task. Furthermore, it adds `alloc-node-exec` capability, required when the alloc task is raw_exec or a driver with no FSIsolation.
In this commit, we add two driver interfaces for supporting `nomad exec` invocation: * A high level `ExecTaskStreamingDriver`, that operates on io reader/writers. Drivers should prefer using this interface * A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of input structs; useful when a driver delegates handling to driver backend (e.g. across RPC/grpc). The interfaces are optional for a driver, as `nomad exec` support is opt-in. Existing drivers continue to compile without exec support, until their maintainer add such support. Furthermore, we create protobuf structures to represent exec stream entities: `ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`. We aim to reuse the protobuf generated code as much as possible, without translation to avoid conversion overhead. `ExecTaskStream` abstract fetching and sending stream entities. It's influenced by the grpc bi-directional stream interface, to avoid needing any adapter. I considered using channels, but the asynchronisity and concurrency makes buffer reuse too complicated, which would put more pressure on GC and slows exec operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review is still a WIP. Just dumping a first batch of comments.
api/allocations.go
Outdated
return -1, ctx.Err() | ||
case frame, ok := <-output: | ||
if !ok { | ||
return -1, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is closing the output chan an expected way to signal an exit? If not we should probably return an error so it doesn't appear the -1 exit signal came from the remote side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - the only legitimate expected way to exit is by receiving a frame with exit results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use -2 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it return an error then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - i'll make it return an error
api/allocations.go
Outdated
case frame.Exited && frame.Result != nil: | ||
return frame.Result.ExitCode, nil | ||
default: | ||
// unexpected event, TODO: log it?! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log or error. Logging seems more reasonable in this case as it allows us to tuck more metadata into the frames in the future without completely breaking old clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear how to log in the cli side of things. Do we have a pattern of cli logging? Also, logging can be very disruptive, consider the case where you are running in vim/cli-with-rich-interface, having log messages emited may cause display to go weird. That was my concern with all the other places where I would have logged if we were in the agent side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right! I forgot we can't really log in the api
package. Let's update the comment.
// unexpected event, TODO: log it?! | |
// Unexpected event! Can't log as the user may be in an interactive session. | |
// Silently drop unexpected frames to provide graceful degradation for future protocol changes. |
client/alloc_endpoint_test.go
Outdated
} | ||
} | ||
|
||
// TestAlloc_ExecStreaming_ACL_WithIsolation_Image assets that token only needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TestAlloc_ExecStreaming_ACL_WithIsolation_Image assets that token only needs | |
// TestAlloc_ExecStreaming_ACL_WithIsolation_Image asserts that token only needs |
client/alloc_endpoint_test.go
Outdated
} | ||
} | ||
|
||
// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot assets that token only needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot assets that token only needs | |
// TestAlloc_ExecStreaming_ACL_WithIsolation_Chroot asserts that token only needs |
client/alloc_endpoint_test.go
Outdated
} | ||
} | ||
|
||
// TestAlloc_ExecStreaming_ACL_WithIsolation_None assets that token needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TestAlloc_ExecStreaming_ACL_WithIsolation_None assets that token needs | |
// TestAlloc_ExecStreaming_ACL_WithIsolation_None asserts that token needs |
client/alloc_endpoint_test.go
Outdated
} | ||
|
||
var frame drivers.ExecTaskStreamingResponseMsg | ||
json.Unmarshal(msg.Payload, &frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass complete! Overall structure looks great, just worried about some of the trickier multiple-goroutines-handling-errors cases.
return err | ||
} | ||
|
||
stream.Send(drivers.NewExecStreamingResponseExit(result.ExitCode)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error or comment why we don't need to
Width: int(msg.TtySize.Width), | ||
}: | ||
case <-ctx.Done(): | ||
// process terminated before resize is processed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you missing a return here?
msg, err := stream.Recv() | ||
if err == io.EOF { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle non EOF errors here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A couple minor nitpicks and questions remain, but none are blockers.
Also add a helper that converts the adapts the high level interface to the low-level interface of nomad exec interfaces.
Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.
This adds a websocket endpoint for handling `nomad exec`. The endpoint is a websocket interface, as we require a bi-directional streaming (to handle both input and output), which is not very appropriate for plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I considered using golang http hijack capability to treat http request as a plain connection, but the web interface would be too complicated potentially. Furthermore, the API endpoint operates against the raw core nomad exec streaming datastructures, defined in protobuf, with json serializer. Our APIs use json interfaces in general, and protobuf generates json friendly golang structs. Reusing the structs here simplify interface and reduce conversion overhead.
Add consolidated testing package to serve as conformance tests for all drivers.
Adds nomad exec support in our API, by hitting the websocket endpoint. We introduce API structs that correspond to the drivers streaming exec structs. For creating the websocket connection, we reuse the transport setting from api http client.
bffe3d2
to
980e4f5
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Context
This PR is the first of PRs that aim to introduce
nomad exec
: a command allowing operators to execute commands into running tasks that run anywhere in cluster in any driver that supports exec. Thinkdocker exec
but that can run against task running remotely and with exec/java tasks as well.The PRs are relatively very large, and best reviewed by examining individual commits along with the commit messages that dig deeper into implementation choices.
This PR: Plumbing and initial driver support
This PR introduces an API endpoint for executing arbitrary commands on running tasks, that forwards the API call to the nomad client running the task.
The endpoint uses websockets for processing user inputs and streaming back process output. The agent handler would forward the stream entities all the way to client. Some care is given to reuse the same serialization bits all the way to drivers, and only unpack the stream at the very last moment (driver or executor); this allows us to reduce transfer/conversion overhead and in some cases allow reuse of buffers.
When ACLs are enabled, an operator must have "alloc-exec" capability enabled to exec into any command. If the task is a
raw_exec
or using a driver that has no isolation, "alloc-node-exec" capability.In this PR, we add support for docker as it's the most trivial implementation. Follow up PRs would add support for other drivers.
High level walk through and key ideas:
(please review the commit messages for details)
The PR starts with few helper changes that are relevant for subsequent significant changes - wanted to get them out of the way before getting into the meaty commits:
6d711d0:
introduces the core structs and interfaces involved. It introduces two (one high-level and one low-level) interfaces for drivers intending to support nomad exec. Drivers support is opt-in, and existing drivers can compile without any modification.
Also introduces the websocket streaming entities (e.g.
ExecTaskStreamingRequest
andExecTaskStreamingResponse
) that will get reused in the API as well in e78f7ce#diff-bafd99d510459580cfc433bada3fde5cR535 .4ade58e and 781b255:
58a4c71
e78f7ce
Allocations.Exec
API convenience method for invoking the API.d144553
a8e460a: