Regression on M1 GPUs #1482

FleetAdmiralButter · 2021-11-25T10:09:51Z

Hi there,

There's been an issue with the 1.1.5 and 1.1.6 releases of MoltenVK on Apple GPUs - namely a "GPU Lost" error causes apps using these versions of the library to crash. This problem is not present on versions 1.1.4 and below.

Here are some logs of the errors that get generated (they're quite long so I've put it in a Pastebin) - https://pastebin.com/8R78V4FY.

I pulled the MoltenVK repositories locally and managed to narrow down the commits that caused this regression:
8c7db31
8e6731f

Reverting these commits from the v1.1.6 tag and recompiling resulted in the errors above going away completely.

A diff is available here as reference: Kegworks-App#1.

The text was updated successfully, but these errors were encountered:

billhollings · 2021-11-25T23:20:09Z

As mentioned in the commit comments, those two commits were important in fixing how synchronization works between Vulkan submits on MoltenVK. Specifically, using MTLEvent instead of MTLFence for VkSemaphores.

There were a large number of CTS tests failing until that change, because MTLFence is not an appropriate sync mechanism between Vulkan submits (which means between separate MTLCommandBuffers).

Is there an understanding of why these MTLCommandBuffer failures were happening in your app? Is the app doing anything with VKSemaphores other than using them to sync between two separate queue submits?

Gcenx · 2021-11-26T02:23:28Z

@billhollings this happens with using DXVK-macOS and it affects different titles differently. The Witcher3 for example will lockup when your hit showing the message “GPU Lost”

FF14 crashes shortly after launching of using MoltenVK-v1.1.5/v1.1.6, downgrading back to MoltenVK-v1.1.4 or reverting the mention commits the title works again.

If you can provide any environment exports or debug options to use to help figure this out I’ll provide the output. But lightly the beat person for this would be @cdavis5e as there much more familiar with DXVK used with MoltenVK (hopefully CX21.1 lands shortly so there free)

billhollings · 2021-11-26T09:52:52Z

If you can provide any environment exports or debug options to use to help figure this out I’ll provide the output.

The type of Metal sync (MTLEvent vs MTLFence) used for Vulkan semaphores (VkSemaphore) is controlled by the MoltenVK environment variables/build settings MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES respectively.

By default, MVK_ALLOW_METAL_EVENTS is enabled and MVK_ALLOW_METAL_FENCES is disabled.
If both are enabled, MVK_ALLOW_METAL_EVENTS takes priority.
If neither is enabled, MoltenVK uses CPU sync emulation for VkSemaphores.
In the special case of VK_SEMAPHORE_TYPE_TIMELINE semaphores, MoltenVK will always use MTLSharedEvent if it is available on the platform, regardless of the values of MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES.
On NVIDIA GPUs, MVK_ALLOW_METAL_EVENTS is ignored and MTLEvent is never used for VkSemaphore. On NVIDIA, MVK_ALLOW_METAL_FENCES is respected, so the choice is between CPU emulation (default) or using MTLFence.

A little bit more documentation on these settings can be found in vk_mvk_moltenvk.h.

rcaridade145 · 2021-11-26T09:54:35Z

@billhollings this happens with using DXVK-macOS

The async patch? https://github.com/Sporif/dxvk-async ?

There are reports of this on nvidia also

https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/2
doitsujin/dxvk#1791

This post in particular seems important.
https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/12

Workaround on dxvk

doitsujin/dxvk@9fe16b6

Gcenx · 2021-11-26T15:44:06Z

The type of Metal sync (MTLEvent vs MTLFence) used for Vulkan semaphores (VkSemaphore) is controlled by the MoltenVK environment variables/build settings MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES respectively.

By default, MVK_ALLOW_METAL_EVENTS is enabled and MVK_ALLOW_METAL_FENCES is disabled.

If both are enabled, MVK_ALLOW_METAL_EVENTS takes priority.

If neither is enabled, MoltenVK uses CPU sync emulation for VkSemaphores.

In the special case of VK_SEMAPHORE_TYPE_TIMELINE semaphores, MoltenVK will always use MTLSharedEvent if it is available on the platform, regardless of the values of MVK_ALLOW_METAL_EVENTS and MVK_ALLOW_METAL_FENCES.

On NVIDIA GPUs, MVK_ALLOW_METAL_EVENTS is ignored and MTLEvent is never used for VkSemaphore. On NVIDIA, MVK_ALLOW_METAL_FENCES is respected, so the choice is between CPU emulation (default) or using MTLFence.

A little bit more documentation on these settings can be found in vk_mvk_moltenvk.h.

My question more pertained to obtaining useful output to help someone figure this out, but I’m leaning towards forcing what seems to be required behavior for only Apple Silicon systems like how NVidia GPUs are handled.

The async patch? https://github.com/Sporif/dxvk-async ?

There are reports of this on nvidia also

https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/2 doitsujin/dxvk#1791

This post in particular seems important. https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513/12

Workaround on dxvk

doitsujin/dxvk@9fe16b6

Having async patch added doesn’t resolve the issue only downgrading back to MoltenVK-v1.1.4 (for FF14) or MoltenVK-v1.1.3 (for Witcher 3).

CrossOver-21/PortingKit(Wineskin) are all using DXVK-macOS-1.7, there’s a more recent DXVK-macOS-1.9.2 and a separate async patches version.

Not sure I’d want to venture down the rabbit hole of how terrible the Linux NVidia GPU drivers are as the issue my seem similar but might not be related.

rcaridade145 · 2021-11-26T17:41:49Z

This https://developer.apple.com/forums/thread/685187 references MoltenVK.

We've finally been able to reproduce the issue and submitted a feedback request (9356792) and a TSI (773748730).

There were a large number of CTS tests failing until that change, because MTLFence is not an appropriate sync mechanism between Vulkan submits (which means between separate MTLCommandBuffers).

Reading some Apple documentation and discussions on forums left me a bit confused about that.

MTLFence detailed behavior? - https://developer.apple.com/forums/thread/108416?answerId=334883022#334883022

FleetAdmiralButter · 2021-11-27T04:24:46Z

Did a bit more experimentation with the environment variables listed above... setting the following in my Crossover environment lets stock MVK 1.1.6 (without my patches) start properly on M1 without the "GPU Lost" errors:

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"

Gcenx · 2021-11-27T14:36:25Z

I don’t see any Rosetta2 handling within MoltenVK and from the only mention of this the stance was Apple needs to handle these things.

So the question now is how to ensure that _metalFeatures.events = true doesn’t get set for when running via Rosetta2

Gcenx · 2021-11-28T16:30:31Z

Here's a clean solution that works for my use case;

From c2eb3d520956e9680172f5b57e597917a02f26c5 Mon Sep 17 00:00:00 2001
From: Dean M Greer <gcenx83@gmail.com>
Date: Sat, 27 Nov 2021 22:21:02 -0500
Subject: [PATCH] Apple GPUs prefer emulation for VkSemaphore

---
 MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm b/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
index ef1c1307..a3231acd 100644
--- a/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
+++ b/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm
@@ -3971,7 +3971,7 @@ static uint32_t mvkGetEntryProperty(io_registry_entry_t entry, CFStringRef prope
 	// Prefer MTLEvent, because MTLEvent handles sync across MTLCommandBuffers and MTLCommandQueues.
 	// However, do not allow use of MTLEvents on NVIDIA GPUs, which have demonstrated trouble with MTLEvents.
 	// Since MTLFence config is disabled by default, emulation will be used on NVIDIA unless MTLFence is enabled.
-	bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId);
+	bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId) && (_pProperties->vendorID != kAppleVendorId);
 	bool canUseMTLFenceForSem4 = _pMetalFeatures->fences && mvkConfig().semaphoreUseMTLFence;
 	_vkSemaphoreStyle = canUseMTLEventForSem4 ? VkSemaphoreStyleUseMTLEvent : (canUseMTLFenceForSem4 ? VkSemaphoreStyleUseMTLFence : VkSemaphoreStyleUseEmulation);
 	switch (_vkSemaphoreStyle) {
-- 
2.30.1 (Apple Git-130)

billhollings · 2021-11-28T23:07:46Z

@FleetAdmiralButter

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"

Can you repeat this experiment with both settings disabled, and report the results, please:

"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "0"

This will cause MoltenVK to use CPU emulation, rather than MTLFence for VkSemaphores. The reason for doing this is that MTLFences do not bridge across MTLCommandBuffers, which makes them unsuitable for use in VkSemaphores that are being used to sync two Vulkan command submits (eg. vkQueueSubmit() or vkQueuePresentKHR()). Using CPU emulation should be more appropriate.

Gcenx · 2021-11-28T23:21:01Z

@billhollings I’ve tested both of the mentioned scenarios and cpu emulation caused some screen tearing, where MTLFences didn’t.

billhollings · 2021-11-28T23:24:50Z

bool canUseMTLEventForSem4 = _pMetalFeatures->events && mvkConfig().semaphoreUseMTLEvent && (_pProperties->vendorID != kNVVendorId) && (_pProperties->vendorID != kAppleVendorId);

Thanks for the experiment. It's a valid step...but if this is a Rosetta issue, we will only want to select this if we are running on Rosetta. That is, MoltenVK built for x86, but running on an M1, by including some kind of !MVK_APPLE_SILICON build test as well.

If we are running an iOS build, or an M1 build natively (not Rosetta), we will want to allow canUseMTLEventForSem4 to be true.

As I mentioned above. the changes in 8c7db31 and 8e6731f dramatically improved CTS sync test results on all GPU's, including M1, so we need to permit native M1 and any iOS SoC's to use MTLEvents for VkSemaphores.

Once @FleetAdmiralButter responds to the results of my request above, I'll create a MoltenVK PR that deals with these various combinations.

billhollings · 2021-11-28T23:34:55Z

@Gcenx

@billhollings I’ve tested both of the mentioned scenarios and cpu emulation caused some screen tearing, where MTLFences didn’t.

Uggh. Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

Gcenx · 2021-11-29T00:09:25Z

@billhollings

Thanks for the experiment. It's a valid step...but if this is a Rosetta issue, we will only want to select this if we are running on Rosetta. That is, MoltenVK built for x86, but running on an M1, by including some kind of !MVK_APPLE_SILICON build test as well.

I agree the patch I've applied really isn't correct due to other devices using Apple GPUs but for the short-term it can work my usage case (wine64 + DXVK) as it's a very minimal chance making it better for quickly rebasing against master.

I was more thinking of adding a check within

MoltenVK/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm

Lines 1383 to 1448 in c376273

    
           #if MVK_MACOS 
        
           	_metalFeatures.mslVersionEnum = MTLLanguageVersion1_1; 
        
               _metalFeatures.maxPerStageTextureCount = 128; 
        
               _metalFeatures.mtlBufferAlignment = 256; 
        
           	_metalFeatures.mtlCopyBufferAlignment = 4; 
        
           	_metalFeatures.baseVertexInstanceDrawing = true; 
        
           	_metalFeatures.layeredRendering = true; 
        
           	_metalFeatures.maxTextureDimension = (16 * KIBI); 
        
           	_metalFeatures.depthSampleCompare = true; 
        
           	_metalFeatures.samplerMirrorClampToEdge = true; 
        
               if (supportsMTLFeatureSet(macOS_GPUFamily1_v2)) { 
        
           		_metalFeatures.mslVersionEnum = MTLLanguageVersion1_2; 
        
           		_metalFeatures.indirectDrawing = true; 
        
           		_metalFeatures.indirectTessellationDrawing = true; 
        
                   _metalFeatures.dynamicMTLBufferSize = (4 * KIBI); 
        
                   _metalFeatures.shaderSpecialization = true; 
        
                   _metalFeatures.stencilViews = true; 
        
                   _metalFeatures.samplerClampToBorder = true; 
        
                   _metalFeatures.combinedStoreResolveAction = true; 
        
           		_metalFeatures.deferredStoreActions = true; 
        
                   _metalFeatures.maxMTLBufferSize = (1 * GIBI); 
        
                   _metalFeatures.maxPerStageDynamicMTLBufferCount = 14; 
        
               } 
        
               if (supportsMTLFeatureSet(macOS_GPUFamily1_v3)) { 
        
           		_metalFeatures.mslVersionEnum = MTLLanguageVersion2_0; 
        
                   _metalFeatures.texelBuffers = true; 
        
           		_metalFeatures.arrayOfTextures = true; 
        
           		_metalFeatures.arrayOfSamplers = true; 
        
           		_metalFeatures.presentModeImmediate = true; 
        
           		_metalFeatures.fences = true; 
        
           		_metalFeatures.nonUniformThreadgroups = true; 
        
           		_metalFeatures.argumentBuffers = true; 
        
               } 
        
               if (supportsMTLFeatureSet(macOS_GPUFamily1_v4)) { 
        
                   _metalFeatures.mslVersionEnum = MTLLanguageVersion2_1; 
        
                   _metalFeatures.multisampleArrayTextures = true; 
        
           		_metalFeatures.events = true; 
        
                   _metalFeatures.textureBuffers = true; 
        
               } 
        
           	if (supportsMTLFeatureSet(macOS_GPUFamily2_v1)) { 
        
           		_metalFeatures.multisampleLayeredRendering = _metalFeatures.layeredRendering; 
        
           		_metalFeatures.stencilFeedback = true; 
        
           		_metalFeatures.depthResolve = true; 
        
           		_metalFeatures.stencilResolve = true; 
        
           		_metalFeatures.simdPermute = true; 
        
           		_metalFeatures.quadPermute = true; 
        
           		_metalFeatures.simdReduction = true; 
        
           	} 
        
           	if ( mvkOSVersionIsAtLeast(10.15) ) { 
        
           		_metalFeatures.mslVersionEnum = MTLLanguageVersion2_2; 
        
           		_metalFeatures.maxQueryBufferSize = (256 * KIBI); 
        
           		_metalFeatures.native3DCompressedTextures = true; 
        
                   if ( mvkOSVersionIsAtLeast(mvkMakeOSVersion(10, 15, 6)) ) { 
        
                       _metalFeatures.sharedLinearTextures = true; 
        
                   } 
        
           		if (supportsMTLGPUFamily(Mac2)) { 
        
           			_metalFeatures.nativeTextureSwizzle = true; 
        
           			_metalFeatures.placementHeaps = mvkConfig().useMTLHeap; 
        
           			_metalFeatures.renderWithoutAttachments = true; 
        
           		} 
        
           	}

As MoltenVK currently lacks a way to check if running under Rosetta2 I'd decided to go with the minimal change possible to make rebasing simpler.

Uggh. Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

When set to 3 there's slightly less tearing.

FleetAdmiralButter · 2021-11-29T01:24:47Z

@FleetAdmiralButter
"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "1"
Can you repeat this experiment with both settings disabled, and report the results, please:
"MVK_ALLOW_METAL_EVENTS" = "0"
"MVK_ALLOW_METAL_FENCES" = "0"
This will cause MoltenVK to use CPU emulation, rather than MTLFence for VkSemaphores. The reason for doing this is that MTLFences do not bridge across MTLCommandBuffers, which makes them unsuitable for use in VkSemaphores that are being used to sync two Vulkan command submits (eg. vkQueueSubmit() or vkQueuePresentKHR()). Using CPU emulation should be more appropriate.

@billhollings, what are the possible implications of falling back to CPU emulation so I know what to look out for? Would it be lower performance or graphical artefacts? I believe Gcenx mentioned experiencing screen tearing in their post.

I'm currently at work so I'll try this out later in the evening.

FleetAdmiralButter · 2021-11-29T11:28:53Z

@billhollings, just tried running the game again with CPU sync emulation.

I did not notice any graphical artefacts, but there was a massive drop in performance. Scenes that usually ran close to 60FPS dropped down to as low as 30-40FPS.

Setting MVK_ALLOW_METAL_FENCES back to 1 and restarting the application resolved the performance drop.

Here are some screenshots - I'm standing around a hub area in Gridania with a number of players on-screen:

CPU sync (averaging 39 FPS):

MVK_ALLOW_METAL_FENCES enabled (averaging 55FPS):

billhollings · 2021-11-29T18:21:37Z

@FleetAdmiralButter

there was a massive drop in performance. Scenes that usually ran close to 60FPS dropped down to as low as 30-40FPS.

Is that even with VkSwapchainCreateInfoKHR::minImageCount set to 3 instead of 2?

CPU sync requires a round trip back to the CPU to signal a VkSemaphore, so we definitely expect it to run slower. But that delay can be mitigated by triple buffering instead of double buffering.

billhollings · 2021-12-01T17:13:56Z

I've just completed some extensive testing using the CTS synchronization suite of tests, on M1, AMD & Intel GPU's, using each of the MTLEvent, MTLFence, and CPU callback mechanisms for handling VkSemaphores in MoltenVK.

On all GPU's, MTLEvent and CPU callbacks perform without any sync failures. MTLFence has random sync failures on all GPUs, particularly on M1, where a test suite run can encounter hundreds of test failures due to sync problems. These MTLFence failures are more or less consistent in frequency, but random in outcome, in that different tests fail on different runs of the same configuration. This is likely due to MTLFence not syncing between Vulkan submissions.

The CTS tests are aimed at consistency and accuracy, and not performance, so they won't cover things like the screen tearing and FPS.

Based on all this, given the issues above with Rosetta2 implementations, I think the best path forward is to modify MoltenVK so that we use MTLEvent by default, except on platforms that demonstrate issues with MTLEvents. On those platforms, we will fallback to CPU callbacks, under the premise that accuracy and consistency is important.

Platforms that have demonstrated issues with MTLEvent now include Rosetta2 on M1, and NVIDIA.

On these platforms, apps that prefer to prioritize performance and screen tearing over accuracy and consistency, will be able to opt to use MTLFence by using the MVK_ALLOW_METAL_FENCES=1 environment variable or build setting.

Gcenx · 2021-12-01T17:32:20Z

@billhollings thank you for genuinely looking into this.

For upstream that’s defiantly the beat course of action, downstream projects can always change from the default behavior as needed.

FleetAdmiralButter · 2021-12-01T22:21:30Z

Hey @billhollings @Gcenx,

Apologies - I haven't had the chance to look at this in a couple days.

Thanks for taking the time to test this out further - since this looks like a driver issue when running under Rosetta, would it be worth reporting to Apple?

billhollings · 2021-12-01T23:29:15Z

PR #1425 fixes this as described above.

Please retest and indicate whether this fixes your issue.

Gcenx · 2021-12-02T01:53:38Z

@billhollings the proposed fix does indeed fix the issue.

billhollings · 2021-12-02T21:41:56Z

the proposed fix does indeed fix the issue.

Thanks. Closing this now.

billhollings added the Requested more info label Nov 25, 2021

billhollings added the Bug label Dec 1, 2021

billhollings self-assigned this Dec 1, 2021

billhollings removed the Requested more info label Dec 1, 2021

billhollings mentioned this issue Dec 1, 2021

Updates to better support Rosetta2 runtimes, and MSL 2.4 and 2.3 versions. #1485

Merged

billhollings added the Completed Issue has been fixed, or enhancement implemented. label Dec 1, 2021

billhollings closed this as completed Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression on M1 GPUs #1482

Regression on M1 GPUs #1482

FleetAdmiralButter commented Nov 25, 2021

billhollings commented Nov 25, 2021

Gcenx commented Nov 26, 2021

billhollings commented Nov 26, 2021 •

edited

Loading

rcaridade145 commented Nov 26, 2021

Gcenx commented Nov 26, 2021

rcaridade145 commented Nov 26, 2021 •

edited

Loading

FleetAdmiralButter commented Nov 27, 2021

Gcenx commented Nov 27, 2021

Gcenx commented Nov 28, 2021

billhollings commented Nov 28, 2021

Gcenx commented Nov 28, 2021

billhollings commented Nov 28, 2021

billhollings commented Nov 28, 2021

Gcenx commented Nov 29, 2021

FleetAdmiralButter commented Nov 29, 2021

FleetAdmiralButter commented Nov 29, 2021

billhollings commented Nov 29, 2021

billhollings commented Dec 1, 2021

Gcenx commented Dec 1, 2021

FleetAdmiralButter commented Dec 1, 2021

billhollings commented Dec 1, 2021

Gcenx commented Dec 2, 2021

billhollings commented Dec 2, 2021

Regression on M1 GPUs #1482

Regression on M1 GPUs #1482

Comments

FleetAdmiralButter commented Nov 25, 2021

billhollings commented Nov 25, 2021

Gcenx commented Nov 26, 2021

billhollings commented Nov 26, 2021 • edited Loading

rcaridade145 commented Nov 26, 2021

Gcenx commented Nov 26, 2021

rcaridade145 commented Nov 26, 2021 • edited Loading

FleetAdmiralButter commented Nov 27, 2021

Gcenx commented Nov 27, 2021

Gcenx commented Nov 28, 2021

billhollings commented Nov 28, 2021

Gcenx commented Nov 28, 2021

billhollings commented Nov 28, 2021

billhollings commented Nov 28, 2021

Gcenx commented Nov 29, 2021

FleetAdmiralButter commented Nov 29, 2021

FleetAdmiralButter commented Nov 29, 2021

billhollings commented Nov 29, 2021

billhollings commented Dec 1, 2021

Gcenx commented Dec 1, 2021

FleetAdmiralButter commented Dec 1, 2021

billhollings commented Dec 1, 2021

Gcenx commented Dec 2, 2021

billhollings commented Dec 2, 2021

billhollings commented Nov 26, 2021 •

edited

Loading

rcaridade145 commented Nov 26, 2021 •

edited

Loading