Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad should pin tasks to CPUs underneath the hood #2303

Closed
Fleischloser opened this issue Feb 10, 2017 · 22 comments · Fixed by #8291
Closed

Nomad should pin tasks to CPUs underneath the hood #2303

Fleischloser opened this issue Feb 10, 2017 · 22 comments · Fixed by #8291

Comments

@Fleischloser
Copy link

Fleischloser commented Feb 10, 2017

Hi,
for our Realtime-Streaming-Cluster Setup it is necessary to set docker --cpuset-cpus="".

I found in the go-dockerclient/container.go a HostConfig:CPUSetCPUs is it possible to set this config via json config?

Example JSON:
.... task "ABC" { driver = "docker" config { image = "image" port_map {http = 8080} } resources { cpusets = "0-15" memory = 32000 network { mbits = 1000 port "http" {} } } }

In the driver/docker.go at line 755 i only found the possibility to set the "CPUShares: int64(task.Resources.CPU)".

Is it possible to add something like this:
`
hostConfig := &docker.HostConfig{
// Convert MB to bytes. This is an absolute value.
Memory: memLimit,
MemorySwap: memLimit, // MemorySwap is memory + swap.
// Binds are used to mount a host volume into the container. We mount a
// local directory for storage and a shared alloc directory that can be
// used to share data between different tasks in the same task group.
Binds: binds,
}

if (len(task.Resources.CPUSETS) != 0) {
// Set shares. This is a relative value.
hostConfig.CPUSetCPUs = int64(task.Resources.CPUSETS),
} else {
// Convert Mhz to shares. This is a relative value.
hostConfig.CPUShares = int64(task.Resources.CPU),
}
`

and run the Docker-Container with the set CPUSetCPUs?

Greetz

@dadgar
Copy link
Contributor

dadgar commented Feb 14, 2017

Hey,

Why do you need to assign the task to individual CPUs. This is likely not a feature we will support/expose to the user as multiple jobs running on the same node could have detrimental performance effects if they have the same cpu pinning.

@Fleischloser
Copy link
Author

Hi,

we have not so many nodes but they have a lot of cores (64).

We want to deploy serveral System-Services on each host. Our measurements show that Linux fair scheduling performes badly in this setup.
By adjusting the CPU-Sets we improved the performance by round about 100%.

We are working in a high performance low latency area and we have to be carefull about using resources.

@dvusboy
Copy link

dvusboy commented Feb 15, 2017

There are lots of HPC software that are CPU-aware and works better with CPU affinity enabled. This is one area where we often don't want sharing of cores that's enabled with Nomad's MHz-based CPU allocation. It would be nice if we can specify the number of cores (physical or logical) required by a task and let Nomad manage the --cpuset-* parameters to the Docker driver.

@dadgar
Copy link
Contributor

dadgar commented Feb 15, 2017

@Fleischloser @dvusboy Makes sense. Did my point about why this should be an internal decision by Nomad and not set by the individual job make sense though? You can imagine you have two jobs submitted by two different teams pinning on to the same cores. In that case you will have worse performance than allowing Nomad to decide pinning itself.

@dvusboy
Copy link

dvusboy commented Feb 15, 2017

I get that, and that's why I don't think the job spec should have explicit cpu-set, but rather let Nomad manage those cpu-set related parameters. It should be part of the scheduling constraints. But it may also means, down the road, allowing Nomad to relocate tasks (if that's spec'd as OK) in order to accommodate more tasks with CPU affinity to be allocated to a host.

@dvusboy
Copy link

dvusboy commented Feb 17, 2017

And here's an example of messages from Gromacs that imply with cpu/thread affinity, the application will perform significantly better:

Using 1 MPI thread
Using 8 OpenMP threads


NOTE: The number of threads is not equal to the number of (logical) cores
      and the -pin option is set to auto: will not pin thread to cores.
      This can lead to significant performance degradation.
      Consider using -pin on (and -pinoffset in case you run multiple jobs).


NOTE: Thread affinity setting failed. This can cause performance degradation.
      If you think your settings are correct, ask on the gmx-users list.

@dadgar
Copy link
Contributor

dadgar commented Feb 17, 2017

Cool glad we are on the same page. Going to rename the title

@dadgar dadgar changed the title [question] Nomad Docker CPUSetCPUs Nomad should pin tasks to CPUs underneath the hood Feb 17, 2017
@dvusboy
Copy link

dvusboy commented Feb 17, 2017

For most (micro)service applications, this isn't necessary and the MHz-based scheduling works just fine. But many MPI/OpenMP scientific applications are more finicky and requires affinity for good performance. I would say Nomad should support pinning tasks to CPUs underneath the hood when a Task asks for it. But most importantly, don't let user specifies the cpu-set explicitly in the job specification. Thanks.

@gmarkey
Copy link

gmarkey commented Feb 27, 2017

Nomad has managed to fill a fairly poorly serviced niche in the distributed scheduler space with its ability to run both fullvirt and raw exec tasks, however there is still a large focus on using the application as a tool for scheduling tasks that aren't necessary latency sensitive.

Teaching the Nomad scheduler resource (CPU, NUMA, PCI, shm) pinning would make it extremely valuable in the HPC, research and financial space where processing throughput is potentially less valuable than processing latency. I appreciate that the underlying implementation doesn't necessarily factor this in right now, but I cannot +1 this request hard enough.

This can all be achieved with the cgroup integration (to some extent, at least; the PCI and SHM pinning isn't really supported with this mechanism).

Even better to have:

  • Scheduling support for "cpuset.cpu_exclusive" to avoid noisy neighbour.
  • Referencing other tasks in Nomad for affinity, e.g. we would like to ensure Task-B has the same NUMA affinity as Task-A.

@c4milo
Copy link
Contributor

c4milo commented Mar 15, 2017

Another use case is being able to deploy apps using intel's DPDK libraries for fast packet processing which would likely require CPU pinning for optimal performance.

@henrikjohansen
Copy link

We have JVM instances that require CPU (read: socket) pinning in order to perform because of memory locality ... so it's definitively a thing :)

@joshuaclausen
Copy link

Being able to set a constraint that only one allocation per CPU core would be a big deal for game server hosting. There are significant performance improvements if a game server can be the only thing running on its assigned core.

@james-masson
Copy link

Kubernetes has recently announced support for exactly this - https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/

With Nomad/raw Docker, specifying cpuset params is the only way to do this right now, and it doesn't scale well, due to potential clashes between tasks.

For bonus points, take into account the kernel "isolcpus" flag, which is another commonly used tweak in this space.

@wyattanderson
Copy link

This would be a really great feature to have. We'd like to be able to partition out a multicore machine and give tasks on that machine integer-value CPU resources. That is, if a machine has 8 vCPUs, we'd like to be able to run 7 tasks on that machine (reserving 1 CPU for system processes), with each container assigned to 1 vCPU (or 2, or 4, etc.). I don't care which CPU it's assigned to, but I don't want to specify arbitrary MHz and I don't want the tasks to share CPUs with other tasks.

@analytically
Copy link

@james-masson how do you set the cpuset parameters using Nomad/raw Docker?

@james-masson
Copy link

Typo I think @analytically , probably meant "Nomad exec/raw docker", both of which allow you to use cgroup/cpuset/taskset etc.

@sstent
Copy link

sstent commented May 20, 2020

The ability to set "cpuset-cpus" is also really handy on some big-little arm systems.

@victorstewart
Copy link

victorstewart commented Nov 6, 2020

this is a super important feature for almost any networked application, such that there are no cross core interrupts or memory accesses with incoming packets. (aka that the NIC pushes them onto the RX queue for the corresponding core, then gets run up through the network stack on that core, and then delivered to the waiting socket on that same core).

so would be amazing if Nomad pinned the process to whichever core it decides to schedule it to. and be able to specify pinning by either logical core or physical core. (as processes per logical core deliver higher throughout, but by physical core lower latencies... as you might want for a database).

it would also be amazing to have a concept of "reserved" cores and "shared" cores. maybe you want to pin application servers to reserved cores each, and then have you 10 periodic cron jobs all share 1 shared core.

but if we currently request 1 logical core from Nomad for a process, and then pin to it from within the process... won't Nomad not schedule any other processes on that core unless you began requesting more cpu resources than your machine had? So we can already do this?

But if we requested 2 logical cores for a process, is there any guarantee we'd get 1 physical core?

@ketzacoatl
Copy link
Contributor

@notnoop, would this be relevant to exec or raw_exec as well? Does it make any sense to open a new ticket for that?

@notnoop notnoop reopened this Nov 12, 2020
@notnoop
Copy link
Contributor

notnoop commented Nov 12, 2020

Thanks @ketzacoatl - re-opening this ticket as PR #8291 only addresses docker and doesn't address the full story of having Nomad manage and pin CPUs.

@tgross
Copy link
Member

tgross commented May 7, 2021

Our implementation shipped in Nomad 1.1.0-beta. For any remaining gaps like how we'd deal with raw_exec, we'll open new issues as needed.

@tgross tgross closed this as completed May 7, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.