-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad should pin tasks to CPUs underneath the hood #2303
Comments
Hey, Why do you need to assign the task to individual CPUs. This is likely not a feature we will support/expose to the user as multiple jobs running on the same node could have detrimental performance effects if they have the same cpu pinning. |
Hi, we have not so many nodes but they have a lot of cores (64). We want to deploy serveral System-Services on each host. Our measurements show that Linux fair scheduling performes badly in this setup. We are working in a high performance low latency area and we have to be carefull about using resources. |
There are lots of HPC software that are CPU-aware and works better with CPU affinity enabled. This is one area where we often don't want sharing of cores that's enabled with Nomad's MHz-based CPU allocation. It would be nice if we can specify the number of cores (physical or logical) required by a task and let Nomad manage the |
@Fleischloser @dvusboy Makes sense. Did my point about why this should be an internal decision by Nomad and not set by the individual job make sense though? You can imagine you have two jobs submitted by two different teams pinning on to the same cores. In that case you will have worse performance than allowing Nomad to decide pinning itself. |
I get that, and that's why I don't think the job spec should have explicit cpu-set, but rather let Nomad manage those cpu-set related parameters. It should be part of the scheduling constraints. But it may also means, down the road, allowing Nomad to relocate tasks (if that's spec'd as OK) in order to accommodate more tasks with CPU affinity to be allocated to a host. |
And here's an example of messages from Gromacs that imply with cpu/thread affinity, the application will perform significantly better:
|
Cool glad we are on the same page. Going to rename the title |
For most (micro)service applications, this isn't necessary and the MHz-based scheduling works just fine. But many MPI/OpenMP scientific applications are more finicky and requires affinity for good performance. I would say Nomad should support pinning tasks to CPUs underneath the hood when a Task asks for it. But most importantly, don't let user specifies the cpu-set explicitly in the job specification. Thanks. |
Nomad has managed to fill a fairly poorly serviced niche in the distributed scheduler space with its ability to run both fullvirt and raw exec tasks, however there is still a large focus on using the application as a tool for scheduling tasks that aren't necessary latency sensitive. Teaching the Nomad scheduler resource (CPU, NUMA, PCI, shm) pinning would make it extremely valuable in the HPC, research and financial space where processing throughput is potentially less valuable than processing latency. I appreciate that the underlying implementation doesn't necessarily factor this in right now, but I cannot +1 this request hard enough. This can all be achieved with the cgroup integration (to some extent, at least; the PCI and SHM pinning isn't really supported with this mechanism). Even better to have:
|
Another use case is being able to deploy apps using intel's DPDK libraries for fast packet processing which would likely require CPU pinning for optimal performance. |
We have JVM instances that require CPU (read: socket) pinning in order to perform because of memory locality ... so it's definitively a thing :) |
Being able to set a constraint that only one allocation per CPU core would be a big deal for game server hosting. There are significant performance improvements if a game server can be the only thing running on its assigned core. |
Kubernetes has recently announced support for exactly this - https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/ With Nomad/raw Docker, specifying cpuset params is the only way to do this right now, and it doesn't scale well, due to potential clashes between tasks. For bonus points, take into account the kernel "isolcpus" flag, which is another commonly used tweak in this space. |
This would be a really great feature to have. We'd like to be able to partition out a multicore machine and give tasks on that machine integer-value CPU resources. That is, if a machine has 8 vCPUs, we'd like to be able to run 7 tasks on that machine (reserving 1 CPU for system processes), with each container assigned to 1 vCPU (or 2, or 4, etc.). I don't care which CPU it's assigned to, but I don't want to specify arbitrary MHz and I don't want the tasks to share CPUs with other tasks. |
@james-masson how do you set the cpuset parameters using Nomad/raw Docker? |
Typo I think @analytically , probably meant "Nomad exec/raw docker", both of which allow you to use cgroup/cpuset/taskset etc. |
The ability to set "cpuset-cpus" is also really handy on some big-little arm systems. |
this is a super important feature for almost any networked application, such that there are no cross core interrupts or memory accesses with incoming packets. (aka that the NIC pushes them onto the RX queue for the corresponding core, then gets run up through the network stack on that core, and then delivered to the waiting socket on that same core). so would be amazing if Nomad pinned the process to whichever core it decides to schedule it to. and be able to specify pinning by either logical core or physical core. (as processes per logical core deliver higher throughout, but by physical core lower latencies... as you might want for a database). it would also be amazing to have a concept of "reserved" cores and "shared" cores. maybe you want to pin application servers to reserved cores each, and then have you 10 periodic cron jobs all share 1 shared core. but if we currently request 1 logical core from Nomad for a process, and then pin to it from within the process... won't Nomad not schedule any other processes on that core unless you began requesting more cpu resources than your machine had? So we can already do this? But if we requested 2 logical cores for a process, is there any guarantee we'd get 1 physical core? |
@notnoop, would this be relevant to |
Thanks @ketzacoatl - re-opening this ticket as PR #8291 only addresses docker and doesn't address the full story of having Nomad manage and pin CPUs. |
Our implementation shipped in Nomad 1.1.0-beta. For any remaining gaps like how we'd deal with |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Hi,
for our Realtime-Streaming-Cluster Setup it is necessary to set docker --cpuset-cpus="".
I found in the go-dockerclient/container.go a HostConfig:CPUSetCPUs is it possible to set this config via json config?
Example JSON:
.... task "ABC" { driver = "docker" config { image = "image" port_map {http = 8080} } resources { cpusets = "0-15" memory = 32000 network { mbits = 1000 port "http" {} } } }
In the driver/docker.go at line 755 i only found the possibility to set the "CPUShares: int64(task.Resources.CPU)".
Is it possible to add something like this:
`
hostConfig := &docker.HostConfig{
// Convert MB to bytes. This is an absolute value.
Memory: memLimit,
MemorySwap: memLimit, // MemorySwap is memory + swap.
// Binds are used to mount a host volume into the container. We mount a
// local directory for storage and a shared alloc directory that can be
// used to share data between different tasks in the same task group.
Binds: binds,
}
if (len(task.Resources.CPUSETS) != 0) {
// Set shares. This is a relative value.
hostConfig.CPUSetCPUs = int64(task.Resources.CPUSETS),
} else {
// Convert Mhz to shares. This is a relative value.
hostConfig.CPUShares = int64(task.Resources.CPU),
}
`
and run the Docker-Container with the set CPUSetCPUs?
Greetz
The text was updated successfully, but these errors were encountered: