Nomad should pin tasks to CPUs underneath the hood #2303

Fleischloser · 2017-02-10T09:51:11Z

Hi,
for our Realtime-Streaming-Cluster Setup it is necessary to set docker --cpuset-cpus="".

I found in the go-dockerclient/container.go a HostConfig:CPUSetCPUs is it possible to set this config via json config?

Example JSON:
.... task "ABC" { driver = "docker" config { image = "image" port_map {http = 8080} } resources { cpusets = "0-15" memory = 32000 network { mbits = 1000 port "http" {} } } }

In the driver/docker.go at line 755 i only found the possibility to set the "CPUShares: int64(task.Resources.CPU)".

Is it possible to add something like this:
`
hostConfig := &docker.HostConfig{
// Convert MB to bytes. This is an absolute value.
Memory: memLimit,
MemorySwap: memLimit, // MemorySwap is memory + swap.
// Binds are used to mount a host volume into the container. We mount a
// local directory for storage and a shared alloc directory that can be
// used to share data between different tasks in the same task group.
Binds: binds,
}

if (len(task.Resources.CPUSETS) != 0) {
// Set shares. This is a relative value.
hostConfig.CPUSetCPUs = int64(task.Resources.CPUSETS),
} else {
// Convert Mhz to shares. This is a relative value.
hostConfig.CPUShares = int64(task.Resources.CPU),
}
`

and run the Docker-Container with the set CPUSetCPUs?

Greetz

The text was updated successfully, but these errors were encountered:

dadgar · 2017-02-14T20:59:52Z

Hey,

Why do you need to assign the task to individual CPUs. This is likely not a feature we will support/expose to the user as multiple jobs running on the same node could have detrimental performance effects if they have the same cpu pinning.

Fleischloser · 2017-02-15T08:45:50Z

Hi,

we have not so many nodes but they have a lot of cores (64).

We want to deploy serveral System-Services on each host. Our measurements show that Linux fair scheduling performes badly in this setup.
By adjusting the CPU-Sets we improved the performance by round about 100%.

We are working in a high performance low latency area and we have to be carefull about using resources.

dvusboy · 2017-02-15T17:15:09Z

There are lots of HPC software that are CPU-aware and works better with CPU affinity enabled. This is one area where we often don't want sharing of cores that's enabled with Nomad's MHz-based CPU allocation. It would be nice if we can specify the number of cores (physical or logical) required by a task and let Nomad manage the --cpuset-* parameters to the Docker driver.

dadgar · 2017-02-15T18:19:12Z

@Fleischloser @dvusboy Makes sense. Did my point about why this should be an internal decision by Nomad and not set by the individual job make sense though? You can imagine you have two jobs submitted by two different teams pinning on to the same cores. In that case you will have worse performance than allowing Nomad to decide pinning itself.

dvusboy · 2017-02-15T21:17:43Z

I get that, and that's why I don't think the job spec should have explicit cpu-set, but rather let Nomad manage those cpu-set related parameters. It should be part of the scheduling constraints. But it may also means, down the road, allowing Nomad to relocate tasks (if that's spec'd as OK) in order to accommodate more tasks with CPU affinity to be allocated to a host.

dvusboy · 2017-02-17T18:09:37Z

And here's an example of messages from Gromacs that imply with cpu/thread affinity, the application will perform significantly better:

Using 1 MPI thread
Using 8 OpenMP threads


NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin thread to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).


NOTE: Thread affinity setting failed. This can cause performance degradation.
      If you think your settings are correct, ask on the gmx-users list.

dadgar · 2017-02-17T19:17:42Z

Cool glad we are on the same page. Going to rename the title

dvusboy · 2017-02-17T19:34:25Z

For most (micro)service applications, this isn't necessary and the MHz-based scheduling works just fine. But many MPI/OpenMP scientific applications are more finicky and requires affinity for good performance. I would say Nomad should support pinning tasks to CPUs underneath the hood when a Task asks for it. But most importantly, don't let user specifies the cpu-set explicitly in the job specification. Thanks.

gmarkey · 2017-02-27T01:05:55Z

Nomad has managed to fill a fairly poorly serviced niche in the distributed scheduler space with its ability to run both fullvirt and raw exec tasks, however there is still a large focus on using the application as a tool for scheduling tasks that aren't necessary latency sensitive.

Teaching the Nomad scheduler resource (CPU, NUMA, PCI, shm) pinning would make it extremely valuable in the HPC, research and financial space where processing throughput is potentially less valuable than processing latency. I appreciate that the underlying implementation doesn't necessarily factor this in right now, but I cannot +1 this request hard enough.

This can all be achieved with the cgroup integration (to some extent, at least; the PCI and SHM pinning isn't really supported with this mechanism).

Even better to have:

Scheduling support for "cpuset.cpu_exclusive" to avoid noisy neighbour.
Referencing other tasks in Nomad for affinity, e.g. we would like to ensure Task-B has the same NUMA affinity as Task-A.

c4milo · 2017-03-15T00:50:14Z

Another use case is being able to deploy apps using intel's DPDK libraries for fast packet processing which would likely require CPU pinning for optimal performance.

henrikjohansen · 2017-03-29T11:25:50Z

We have JVM instances that require CPU (read: socket) pinning in order to perform because of memory locality ... so it's definitively a thing :)

joshuaclausen · 2018-10-04T18:40:49Z

Being able to set a constraint that only one allocation per CPU core would be a big deal for game server hosting. There are significant performance improvements if a game server can be the only thing running on its assigned core.

james-masson · 2018-11-16T15:20:18Z

Kubernetes has recently announced support for exactly this - https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

With Nomad/raw Docker, specifying cpuset params is the only way to do this right now, and it doesn't scale well, due to potential clashes between tasks.

For bonus points, take into account the kernel "isolcpus" flag, which is another commonly used tweak in this space.

wyattanderson · 2019-04-01T17:48:01Z

This would be a really great feature to have. We'd like to be able to partition out a multicore machine and give tasks on that machine integer-value CPU resources. That is, if a machine has 8 vCPUs, we'd like to be able to run 7 tasks on that machine (reserving 1 CPU for system processes), with each container assigned to 1 vCPU (or 2, or 4, etc.). I don't care which CPU it's assigned to, but I don't want to specify arbitrary MHz and I don't want the tasks to share CPUs with other tasks.

analytically · 2019-11-11T19:41:14Z

@james-masson how do you set the cpuset parameters using Nomad/raw Docker?

james-masson · 2019-11-11T20:56:28Z

Typo I think @analytically , probably meant "Nomad exec/raw docker", both of which allow you to use cgroup/cpuset/taskset etc.

sstent · 2020-05-20T19:50:41Z

The ability to set "cpuset-cpus" is also really handy on some big-little arm systems.

victorstewart · 2020-11-06T18:49:39Z

this is a super important feature for almost any networked application, such that there are no cross core interrupts or memory accesses with incoming packets. (aka that the NIC pushes them onto the RX queue for the corresponding core, then gets run up through the network stack on that core, and then delivered to the waiting socket on that same core).

so would be amazing if Nomad pinned the process to whichever core it decides to schedule it to. and be able to specify pinning by either logical core or physical core. (as processes per logical core deliver higher throughout, but by physical core lower latencies... as you might want for a database).

it would also be amazing to have a concept of "reserved" cores and "shared" cores. maybe you want to pin application servers to reserved cores each, and then have you 10 periodic cron jobs all share 1 shared core.

but if we currently request 1 logical core from Nomad for a process, and then pin to it from within the process... won't Nomad not schedule any other processes on that core unless you began requesting more cpu resources than your machine had? So we can already do this?

But if we requested 2 logical cores for a process, is there any guarantee we'd get 1 physical core?

ketzacoatl · 2020-11-12T17:28:15Z

@notnoop, would this be relevant to exec or raw_exec as well? Does it make any sense to open a new ticket for that?

notnoop · 2020-11-12T18:14:56Z

Thanks @ketzacoatl - re-opening this ticket as PR #8291 only addresses docker and doesn't address the full story of having Nomad manage and pin CPUs.

tgross · 2021-05-07T18:02:37Z

Our implementation shipped in Nomad 1.1.0-beta. For any remaining gaps like how we'd deal with raw_exec, we'll open new issues as needed.

github-actions · 2022-10-20T02:44:02Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added the stage/waiting-reply label Feb 14, 2017

dadgar changed the title ~~[question] Nomad Docker CPUSetCPUs~~ Nomad should pin tasks to CPUs underneath the hood Feb 17, 2017

dadgar added theme/client type/enhancement and removed stage/waiting-reply labels Feb 17, 2017

james-masson mentioned this issue Apr 11, 2019

Resource definitions for placement should be (optionally) different to policies for sharing/enforcement #5547

Open

shishir-a412ed mentioned this issue Jun 25, 2020

Add cpuset_cpus to docker driver. #8291

Merged

tgross added the stage/needs-discussion label Aug 24, 2020

notnoop closed this as completed in #8291 Nov 11, 2020

notnoop reopened this Nov 12, 2020

tgross closed this as completed May 7, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad should pin tasks to CPUs underneath the hood #2303

Nomad should pin tasks to CPUs underneath the hood #2303

Fleischloser commented Feb 10, 2017 •

edited

Loading

dadgar commented Feb 14, 2017

Fleischloser commented Feb 15, 2017

dvusboy commented Feb 15, 2017

dadgar commented Feb 15, 2017

dvusboy commented Feb 15, 2017

dvusboy commented Feb 17, 2017

dadgar commented Feb 17, 2017

dvusboy commented Feb 17, 2017

gmarkey commented Feb 27, 2017

c4milo commented Mar 15, 2017

henrikjohansen commented Mar 29, 2017

joshuaclausen commented Oct 4, 2018

james-masson commented Nov 16, 2018

wyattanderson commented Apr 1, 2019

analytically commented Nov 11, 2019

james-masson commented Nov 11, 2019

sstent commented May 20, 2020

victorstewart commented Nov 6, 2020 •

edited

Loading

ketzacoatl commented Nov 12, 2020

notnoop commented Nov 12, 2020 •

edited

Loading

tgross commented May 7, 2021

github-actions bot commented Oct 20, 2022

Nomad should pin tasks to CPUs underneath the hood #2303

Nomad should pin tasks to CPUs underneath the hood #2303

Comments

Fleischloser commented Feb 10, 2017 • edited Loading

dadgar commented Feb 14, 2017

Fleischloser commented Feb 15, 2017

dvusboy commented Feb 15, 2017

dadgar commented Feb 15, 2017

dvusboy commented Feb 15, 2017

dvusboy commented Feb 17, 2017

dadgar commented Feb 17, 2017

dvusboy commented Feb 17, 2017

gmarkey commented Feb 27, 2017

c4milo commented Mar 15, 2017

henrikjohansen commented Mar 29, 2017

joshuaclausen commented Oct 4, 2018

james-masson commented Nov 16, 2018

wyattanderson commented Apr 1, 2019

analytically commented Nov 11, 2019

james-masson commented Nov 11, 2019

sstent commented May 20, 2020

victorstewart commented Nov 6, 2020 • edited Loading

ketzacoatl commented Nov 12, 2020

notnoop commented Nov 12, 2020 • edited Loading

tgross commented May 7, 2021

github-actions bot commented Oct 20, 2022

Fleischloser commented Feb 10, 2017 •

edited

Loading

victorstewart commented Nov 6, 2020 •

edited

Loading

notnoop commented Nov 12, 2020 •

edited

Loading