GPU Support #849

amrragab8080 · 2019-01-14T18:06:04Z

Public facing api doesnt seem to have support yet for passthrough pci devices, namely gpu is this technically feasible?

goswamig · 2019-01-30T01:49:14Z

That would be super helpful to have support for GPU.

acatangiu · 2019-02-18T13:29:09Z

GPU support in Firecracker is very hard/tricky at the moment. With current GPU hardware, there's two major problems:

To do device pass-through implies pinning physical memory which would remove our memory oversubscription capabilities.
We can only run 1 customer workload securely per physical GPU, and switching between customer workloads takes a long enough time to make it impractical.

As a result there is no known path to supporting GPUs in Firecracker.

raduweiss · 2019-07-15T13:03:05Z

@amrragab8080, we'll be looking at this as part of #1179.

normtown · 2019-12-10T21:08:39Z

To do device pass-through implies pinning physical memory which would remove our memory oversubscription capabilities.

Why do you want to maintain the ability to oversubscribe memory?

raduweiss · 2019-12-10T21:34:13Z

Oversubscription is a core part of makes Firecracker a great way to isolate serverless workloads; that's why we took on a tenet around it [1].

[1] https://github.com/firecracker-microvm/firecracker/blob/master/CHARTER.md

normtown · 2019-12-10T21:41:26Z

Why is over-subscription a great way to isolate serverless workloads? I genuinely don't know, so the reasoning that led to the existence of the tenet is not self-evident to me.

raduweiss · 2019-12-10T22:15:52Z

Like all services, serverless compute providers want to keep their servers busy and to improve their overall utilization. Ideally, every CPU cycle on the service provider's servers is running user code, and every byte of RAM is filled with user data. If servers are sitting idle, that’s inefficient.

A part of solving this optimization problem is having the ability to oversubscribe a given server's hardware capacity with workloads who's hardware resource usage is statistically uncorrelated, or, even better, with workloads selected specifically to pack well together.

normtown · 2019-12-10T23:31:53Z

What you appear to be saying is that resource over-subscription helps the hosting service (e.g. AWS Lambda or Fargate) to lower their hardware costs. (Which, in turn, passes savings on to customers...presumably.)

That is not the same as being great for isolating workloads. It seems to be the opposite. Particularly, in the case that all workloads attempt to utilize their full resource reservations at the same time. It sounds like the design here is to bet on the workloads not calling in all their debts.

How ingrained into the Firecracker implementation is this resource-over-subscription tenet? Like, would it be remotely feasible to add a feature flag that turns over-subscription off?

P.S. As an aside, the Firecracker tenets don't seem to align with the Fargate project. Specifically the tenet that calls out favoring transient or stateless workloads over long-running or persistent workloads. The Fargate docs do not place similar restrictions on its workloads (AFAICT).

raduweiss · 2019-12-10T23:51:39Z

That is not the same as being great for isolating workloads. It seems to be the opposite. Particularly, in the case that all workloads attempt to utilize their full resource reservations at the same time. It sounds like the design here is to bet on the workloads not calling in all their debts.

Great for isolating serverless workloads, which are bursty and pay-only-when-running. Take a look at https://www.youtube.com/watch?v=QdzV04T_kec , there some more detail there around how Lambda multiplexes workloads.

How ingrained into the Firecracker implementation is this resource-over-subscription tenet? Like, would it be remotely feasible to add a feature flag that turns over-subscription off?

Well, it's a tenet so we stick to it unless there's a very good reason to change it.

P.S. As an aside, the Firecracker tenets don't seem to align with the Fargate project. Specifically the tenet that calls out favoring transient or stateless workloads over long-running or persistent workloads. The Fargate docs do not place similar restrictions on its workloads (AFAICT).

You're quite right here :) This tenet started out as a powerful simplifying assumption, but as you pointed out, it doesn't quite apply to all the serverless container workloads; we might let go of the "transient and stateless" part.

DemiMarie · 2022-11-04T01:20:44Z

We can only run 1 customer workload securely per physical GPU, and switching between customer workloads takes a long enough time to make it impractical.

What is the reason for this? Is the attack surface of e.g virtio-gpu or Venus excessive?

DemiMarie · 2023-05-27T19:45:03Z

GPU support in Firecracker is very hard/tricky at the moment. With current GPU hardware, there's two major problems:

To do device pass-through implies pinning physical memory which would remove our memory oversubscription capabilities.

In theory it is possible to do better by dynamically manipulating guest IOMMU mappings.

We can only run 1 customer workload securely per physical GPU, and switching between customer workloads takes a long enough time to make it impractical.

Does this also apply to SR-IOV capable GPUs? What about e.g. attacks in which the guest overwrites the GPU’s vBIOS?

Talador12 · 2024-08-20T16:41:11Z

GPU support in Firecracker is very hard/tricky at the moment. With current GPU hardware, there's two major problems:

To do device pass-through implies pinning physical memory which would remove our memory oversubscription capabilities.

We can only run 1 customer workload securely per physical GPU, and switching between customer workloads takes a long enough time to make it impractical.

As a result there is no known path to supporting GPUs in Firecracker.

This is becoming increasingly more important to support. It may be difficult, but we need to find a way to do this.

This can also be managed with Nvidia MIG (cutting the physical GPU into slices) and exposing a specific slice to the VM. This does provide capacity limitations in the current state, but only on GPU capacity, which is widely accepted at the moment.

Also - this issue is not "Closed". It is not implemented and users are still asking for this. We should move this conversation to #1179 since it is still open

raduweiss added Type: Question Indicates that an issue, pull request, or discussion needs more information Roadmap: New Request labels Jan 16, 2019

alexandruag added the Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled label Feb 15, 2019

acatangiu self-assigned this Feb 18, 2019

acatangiu closed this as completed Feb 18, 2019

raduweiss mentioned this issue Jul 15, 2019

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

Open

raduweiss mentioned this issue Jul 15, 2019

[RFC] 2020 Roadmap #1104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Support #849

GPU Support #849

amrragab8080 commented Jan 14, 2019

goswamig commented Jan 30, 2019

acatangiu commented Feb 18, 2019

raduweiss commented Jul 15, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019 •

edited

Loading

DemiMarie commented Nov 4, 2022

DemiMarie commented May 27, 2023

Talador12 commented Aug 20, 2024 •

edited

Loading

GPU Support #849

GPU Support #849

Comments

amrragab8080 commented Jan 14, 2019

goswamig commented Jan 30, 2019

acatangiu commented Feb 18, 2019

raduweiss commented Jul 15, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019

normtown commented Dec 10, 2019

raduweiss commented Dec 10, 2019 • edited Loading

DemiMarie commented Nov 4, 2022

DemiMarie commented May 27, 2023

Talador12 commented Aug 20, 2024 • edited Loading

raduweiss commented Dec 10, 2019 •

edited

Loading

Talador12 commented Aug 20, 2024 •

edited

Loading