Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on M1 GPUs #1482

Closed
FleetAdmiralButter opened this issue Nov 25, 2021 · 23 comments
Closed

Regression on M1 GPUs #1482

FleetAdmiralButter opened this issue Nov 25, 2021 · 23 comments
Assignees
Labels
Bug Completed Issue has been fixed, or enhancement implemented.

Comments

@FleetAdmiralButter
Copy link

Hi there,

There's been an issue with the 1.1.5 and 1.1.6 releases of MoltenVK on Apple GPUs - namely a "GPU Lost" error causes apps using these versions of the library to crash. This problem is not present on versions 1.1.4 and below.

Here are some logs of the errors that get generated (they're quite long so I've put it in a Pastebin) - https://pastebin.com/8R78V4FY.

I pulled the MoltenVK repositories locally and managed to narrow down the commits that caused this regression:
8c7db31
8e6731f

Reverting these commits from the v1.1.6 tag and recompiling resulted in the errors above going away completely.

A diff is available here as reference: Kegworks-App#1.

@billhollings
Copy link
Contributor

As mentioned in the commit comments, those two commits were important in fixing how synchronization works between Vulkan submits on MoltenVK. Specifically, using MTLEvent instead of MTLFence for VkSemaphores.

There were a large number of CTS tests failing until that change, because MTLFence is not an appropriate sync mechanism between Vulkan submits (which means between separate MTLCommandBuffers).

Is there an understanding of why these MTLCommandBuffer failures were happening in your app? Is the app doing anything with VKSemaphores other than using them to sync between two separate queue submits?

@Gcenx
Copy link

Gcenx commented Nov 26, 2021

@billhollings this happens with using DXVK-macOS and it affects different titles differently. The Witcher3 for example will lockup when your hit showing the message “GPU Lost”

FF14 crashes shortly after launching of using MoltenVK-v1.1.5/v1.1.6, downgrading back to MoltenVK-v1.1.4 or reverting the mention commits the title works again.

If you can provide any environment exports or debug options to use to help figure this out I’ll provide the output. But lightly the beat person for this would be @cdavis5e as there much more familiar with DXVK used with MoltenVK (hopefully CX21.1 lands shortly so there free)

@billhollings
Copy link
Contributor

billhollings commented Nov 26, 2021

If you can provide any environment exports or debug options to use to help figure this out I’ll provide the output.

The type of Metal sync (MTLEvent vs MTLFence) used for Vulkan semaphores (VkSemaphore) is controlled by the MoltenVK environment variables/build settings MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES respectively.

  • By default, MVK_ALLOW_METAL_EVENTS is enabled and MVK_ALLOW_METAL_FENCES is disabled.
  • If both are enabled, MVK_ALLOW_METAL_EVENTS takes priority.
  • If neither is enabled, MoltenVK uses CPU sync emulation for VkSemaphores.
  • In the special case of VK_SEMAPHORE_TYPE_TIMELINE semaphores, MoltenVK will always use MTLSharedEvent if it is available on the platform, regardless of the values of MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES.
  • On NVIDIA GPUs, MVK_ALLOW_METAL_EVENTS is ignored and MTLEvent is never used for VkSemaphore. On NVIDIA, MVK_ALLOW_METAL_FENCES is respected, so the choice is between CPU emulation (default) or using MTLFence.

A little bit more documentation on these settings can be found in vk_mvk_moltenvk.h.

@rcaridade145
Copy link

@Gcenx
Copy link

Gcenx commented Nov 26, 2021

The type of Metal sync (MTLEvent vs MTLFence) used for Vulkan semaphores (VkSemaphore) is controlled by the MoltenVK environment variables/build settings MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES respectively.

  • By default, MVK_ALLOW_METAL_EVENTS is enabled and MVK_ALLOW_METAL_FENCES is disabled.
  • If both are enabled, MVK_ALLOW_METAL_EVENTS takes priority.
  • If neither is enabled, MoltenVK uses CPU sync emulation for VkSemaphores.
  • In the special case of VK_SEMAPHORE_TYPE_TIMELINE semaphores, MoltenVK will always use MTLSharedEvent if it is available on the platform, regardless of the values of MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES.
  • On NVIDIA GPUs, MVK_ALLOW_METAL_EVENTS is ignored and MTLEvent is never used for VkSemaphore. On NVIDIA, MVK_ALLOW_METAL_FENCES is respected, so the choice is between CPU emulation (default) or using MTLFence.

A little bit more documentation on these settings can be found in vk_mvk_moltenvk.h.

My question more pertained to obtaining useful output to help someone figure this out, but I’m leaning towards forcing what seems to be required behavior for only Apple Silicon systems like how NVidia GPUs are handled.


The async patch? https://github.com/Sporif/dxvk-async ?

There are reports of this on nvidia also

https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/2 doitsujin/dxvk#1791

This post in particular seems important. https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/12

Workaround on dxvk

doitsujin/dxvk@9fe16b6

Having async patch added doesn’t resolve the issue only downgrading back to MoltenVK-v1.1.4 (for FF14) or MoltenVK-v1.1.3 (for Witcher 3).

CrossOver-21/PortingKit(Wineskin) are all using DXVK-macOS-1.7, there’s a more recent DXVK-macOS-1.9.2 and a separate async patches version.

Not sure I’d want to venture down the rabbit hole of how terrible the Linux NVidia GPU drivers are as the issue my seem similar but might not be related.

@rcaridade145
Copy link

rcaridade145 commented Nov 26, 2021

This https://developer.apple.com/forums/thread/685187 references MoltenVK.

We've finally been able to reproduce the issue and submitted a feedback request (9356792) and a TSI (773748730).

There were a large number of CTS tests failing until that change, because MTLFence is not an appropriate sync mechanism between Vulkan submits (which means between separate MTLCommandBuffers).

Reading some Apple documentation and discussions on forums left me a bit confused about that.

MTLFence detailed behavior? - https://developer.apple.com/forums/thread/108416?answerId=334883022#334883022

@FleetAdmiralButter
Copy link
Author

Did a bit more experimentation with the environment variables listed above... setting the following in my Crossover environment lets stock MVK 1.1.6 (without my patches) start properly on M1 without the "GPU Lost" errors:

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"

@Gcenx
Copy link

Gcenx commented Nov 27, 2021

I don’t see any Rosetta2 handling within MoltenVK and from the only mention of this the stance was Apple needs to handle these things.

So the question now is how to ensure that _metalFeatures.events = true doesn’t get set for when running via Rosetta2

@Gcenx
Copy link

Gcenx commented Nov 28, 2021

Here's a clean solution that works for my use case;

From c2eb3d520956e9680172f5b57e597917a02f26c5 Mon Sep 17 00:00:00 2001
From: Dean M Greer <gcenx83@gmail.com>
Date: Sat, 27 Nov 2021 22:21:02 -0500
Subject: [PATCH] Apple GPUs prefer emulation for VkSemaphore

---
 MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm b/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
index ef1c1307..a3231acd 100644
--- a/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
+++ b/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
@@ -3971,7 +3971,7 @@ static uint32_t mvkGetEntryProperty(io_registry_entry_t entry, CFStringRef prope
 	// Prefer MTLEvent, because MTLEvent handles sync across MTLCommandBuffers and MTLCommandQueues.
 	// However, do not allow use of MTLEvents on NVIDIA GPUs, which have demonstrated trouble with MTLEvents.
 	// Since MTLFence config is disabled by default, emulation will be used on NVIDIA unless MTLFence is enabled.
-	bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId);
+	bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId) && (_pProperties->vendorID != kAppleVendorId);
 	bool canUseMTLFenceForSem4 = _pMetalFeatures->fences && mvkConfig().semaphoreUseMTLFence;
 	_vkSemaphoreStyle = canUseMTLEventForSem4 ? VkSemaphoreStyleUseMTLEvent : (canUseMTLFenceForSem4 ? VkSemaphoreStyleUseMTLFence : VkSemaphoreStyleUseEmulation);
 	switch (_vkSemaphoreStyle) {
-- 
2.30.1 (Apple Git-130)

@billhollings
Copy link
Contributor

@FleetAdmiralButter

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"

Can you repeat this experiment with both settings disabled, and report the results, please:

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "0"

This will cause MoltenVK to use CPU emulation, rather than MTLFence for VkSemaphores. The reason for doing this is that MTLFences do not bridge across MTLCommandBuffers, which makes them unsuitable for use in VkSemaphores that are being used to sync two Vulkan command submits (eg. vkQueueSubmit() or vkQueuePresentKHR()). Using CPU emulation should be more appropriate.

@Gcenx
Copy link

Gcenx commented Nov 28, 2021

@billhollings I’ve tested both of the mentioned scenarios and cpu emulation caused some screen tearing, where MTLFences didn’t.

@billhollings
Copy link
Contributor

  • bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId) && (_pProperties->vendorID != kAppleVendorId);

Thanks for the experiment. It's a valid step...but if this is a Rosetta issue, we will only want to select this if we are running on Rosetta. That is, MoltenVK built for x86, but running on an M1, by including some kind of !MVK_APPLE_SILICON build test as well.

If we are running an iOS build, or an M1 build natively (not Rosetta), we will want to allow canUseMTLEventForSem4 to be true.

As I mentioned above. the changes in 8c7db31 and 8e6731f dramatically improved CTS sync test results on all GPU's, including M1, so we need to permit native M1 and any iOS SoC's to use MTLEvents for VkSemaphores.

Once @FleetAdmiralButter responds to the results of my request above, I'll create a MoltenVK PR that deals with these various combinations.

@billhollings
Copy link
Contributor

@Gcenx

@billhollings I’ve tested both of the mentioned scenarios and cpu emulation caused some screen tearing, where MTLFences didn’t.

Uggh. Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

@Gcenx
Copy link

Gcenx commented Nov 29, 2021

@billhollings

Thanks for the experiment. It's a valid step...but if this is a Rosetta issue, we will only want to select this if we are running on Rosetta. That is, MoltenVK built for x86, but running on an M1, by including some kind of !MVK_APPLE_SILICON build test as well.

I agree the patch I've applied really isn't correct due to other devices using Apple GPUs but for the short-term it can work my usage case (wine64 + DXVK) as it's a very minimal chance making it better for quickly rebasing against master.

I was more thinking of adding a check within

#if MVK_MACOS
_metalFeatures.mslVersionEnum = MTLLanguageVersion1_1;
_metalFeatures.maxPerStageTextureCount = 128;
_metalFeatures.mtlBufferAlignment = 256;
_metalFeatures.mtlCopyBufferAlignment = 4;
_metalFeatures.baseVertexInstanceDrawing = true;
_metalFeatures.layeredRendering = true;
_metalFeatures.maxTextureDimension = (16 * KIBI);
_metalFeatures.depthSampleCompare = true;
_metalFeatures.samplerMirrorClampToEdge = true;
if (supportsMTLFeatureSet(macOS_GPUFamily1_v2)) {
_metalFeatures.mslVersionEnum = MTLLanguageVersion1_2;
_metalFeatures.indirectDrawing = true;
_metalFeatures.indirectTessellationDrawing = true;
_metalFeatures.dynamicMTLBufferSize = (4 * KIBI);
_metalFeatures.shaderSpecialization = true;
_metalFeatures.stencilViews = true;
_metalFeatures.samplerClampToBorder = true;
_metalFeatures.combinedStoreResolveAction = true;
_metalFeatures.deferredStoreActions = true;
_metalFeatures.maxMTLBufferSize = (1 * GIBI);
_metalFeatures.maxPerStageDynamicMTLBufferCount = 14;
}
if (supportsMTLFeatureSet(macOS_GPUFamily1_v3)) {
_metalFeatures.mslVersionEnum = MTLLanguageVersion2_0;
_metalFeatures.texelBuffers = true;
_metalFeatures.arrayOfTextures = true;
_metalFeatures.arrayOfSamplers = true;
_metalFeatures.presentModeImmediate = true;
_metalFeatures.fences = true;
_metalFeatures.nonUniformThreadgroups = true;
_metalFeatures.argumentBuffers = true;
}
if (supportsMTLFeatureSet(macOS_GPUFamily1_v4)) {
_metalFeatures.mslVersionEnum = MTLLanguageVersion2_1;
_metalFeatures.multisampleArrayTextures = true;
_metalFeatures.events = true;
_metalFeatures.textureBuffers = true;
}
if (supportsMTLFeatureSet(macOS_GPUFamily2_v1)) {
_metalFeatures.multisampleLayeredRendering = _metalFeatures.layeredRendering;
_metalFeatures.stencilFeedback = true;
_metalFeatures.depthResolve = true;
_metalFeatures.stencilResolve = true;
_metalFeatures.simdPermute = true;
_metalFeatures.quadPermute = true;
_metalFeatures.simdReduction = true;
}
if ( mvkOSVersionIsAtLeast(10.15) ) {
_metalFeatures.mslVersionEnum = MTLLanguageVersion2_2;
_metalFeatures.maxQueryBufferSize = (256 * KIBI);
_metalFeatures.native3DCompressedTextures = true;
if ( mvkOSVersionIsAtLeast(mvkMakeOSVersion(10, 15, 6)) ) {
_metalFeatures.sharedLinearTextures = true;
}
if (supportsMTLGPUFamily(Mac2)) {
_metalFeatures.nativeTextureSwizzle = true;
_metalFeatures.placementHeaps = mvkConfig().useMTLHeap;
_metalFeatures.renderWithoutAttachments = true;
}
}

As MoltenVK currently lacks a way to check if running under Rosetta2 I'd decided to go with the minimal change possible to make rebasing simpler.

Uggh. Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

When set to 3 there's slightly less tearing.

@FleetAdmiralButter
Copy link
Author

@FleetAdmiralButter

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"

Can you repeat this experiment with both settings disabled, and report the results, please:

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "0"

This will cause MoltenVK to use CPU emulation, rather than MTLFence for VkSemaphores. The reason for doing this is that MTLFences do not bridge across MTLCommandBuffers, which makes them unsuitable for use in VkSemaphores that are being used to sync two Vulkan command submits (eg. vkQueueSubmit() or vkQueuePresentKHR()). Using CPU emulation should be more appropriate.

@billhollings, what are the possible implications of falling back to CPU emulation so I know what to look out for? Would it be lower performance or graphical artefacts? I believe Gcenx mentioned experiencing screen tearing in their post.

I'm currently at work so I'll try this out later in the evening.

@FleetAdmiralButter
Copy link
Author

@billhollings, just tried running the game again with CPU sync emulation.

I did not notice any graphical artefacts, but there was a massive drop in performance. Scenes that usually ran close to 60FPS dropped down to as low as 30-40FPS.

Setting MVK_ALLOW_METAL_FENCES back to 1 and restarting the application resolved the performance drop.

Here are some screenshots - I'm standing around a hub area in Gridania with a number of players on-screen:

CPU sync (averaging 39 FPS):
Screen Shot 2021-11-29 at 10 16 55 pm

MVK_ALLOW_METAL_FENCES enabled (averaging 55FPS):
Screen Shot 2021-11-29 at 10 19 31 pm

@billhollings
Copy link
Contributor

@FleetAdmiralButter

there was a massive drop in performance. Scenes that usually ran close to 60FPS dropped down to as low as 30-40FPS.

Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

CPU sync requires a round trip back to the CPU to signal a VkSemaphore, so we definitely expect it to run slower. But that delay can be mitigated by triple buffering instead of double buffering.

@billhollings
Copy link
Contributor

I've just completed some extensive testing using the CTS synchronization suite of tests, on M1, AMD & Intel GPU's, using each of the MTLEvent, MTLFence, and CPU callback mechanisms for handling VkSemaphores in MoltenVK.

On all GPU's, MTLEvent and CPU callbacks perform without any sync failures. MTLFence has random sync failures on all GPUs, particularly on M1, where a test suite run can encounter hundreds of test failures due to sync problems. These MTLFence failures are more or less consistent in frequency, but random in outcome, in that different tests fail on different runs of the same configuration. This is likely due to MTLFence not syncing between Vulkan submissions.

The CTS tests are aimed at consistency and accuracy, and not performance, so they won't cover things like the screen tearing and FPS.

Based on all this, given the issues above with Rosetta2 implementations, I think the best path forward is to modify MoltenVK so that we use MTLEvent by default, except on platforms that demonstrate issues with MTLEvents. On those platforms, we will fallback to CPU callbacks, under the premise that accuracy and consistency is important.

Platforms that have demonstrated issues with MTLEvent now include Rosetta2 on M1, and NVIDIA.

On these platforms, apps that prefer to prioritize performance and screen tearing over accuracy and consistency, will be able to opt to use MTLFence by using the MVK_ALLOW_METAL_FENCES=1 environment variable or build setting.

@Gcenx
Copy link

Gcenx commented Dec 1, 2021

@billhollings thank you for genuinely looking into this.

For upstream that’s defiantly the beat course of action, downstream projects can always change from the default behavior as needed.

@FleetAdmiralButter
Copy link
Author

Hey @billhollings @Gcenx,

Apologies - I haven't had the chance to look at this in a couple days.

Thanks for taking the time to test this out further - since this looks like a driver issue when running under Rosetta, would it be worth reporting to Apple?

@billhollings
Copy link
Contributor

PR #1425 fixes this as described above.

Please retest and indicate whether this fixes your issue.

@billhollings billhollings added the Completed Issue has been fixed, or enhancement implemented. label Dec 1, 2021
@Gcenx
Copy link

Gcenx commented Dec 2, 2021

@billhollings the proposed fix does indeed fix the issue.

@billhollings
Copy link
Contributor

the proposed fix does indeed fix the issue.

Thanks. Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Completed Issue has been fixed, or enhancement implemented.
Projects
None yet
Development

No branches or pull requests

4 participants