Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shader_debugprintf: support new VVL-DEBUG-PRINTF message and fix VVL version check for API selection #1187

Merged
merged 18 commits into from
Feb 25, 2025

Conversation

SRSaunders
Copy link
Contributor

@SRSaunders SRSaunders commented Oct 9, 2024

Description

Fixes two issues that arose with Vulkan SDK 1.3.296:

  1. Supports new VVL-DEBUG-PRINTF callback message. Previous SDKs used WARNING-DEBUG-PRINTF or UNKNOWN-DEBUG-PRINTF. Without this fix the debug data is not available in the UI Overlay.
  2. Fixes my incorrect assumption that the Vulkan instance version matched the SDK version for all platforms - true on macOS but not true for Windows and Linux. This version is used to set the API level for the sample, which is important for performance and to avoid a previous defect in the Vulkan Validation layer. I have replaced the instance version check with a Validation Layer version check which is portable across all platforms: Win, Linux, macOS. Without this fix, performance is poor on Windows and Linux when using Vulkan SDK 1.3.296.

Fixes #1184.

Tested on Windows 10, Manjaro Linux, and macOS Ventura using Vulkan SDKs 1.3.290 and 1.3.296.

I hope this is the last time I have to fix this. It seems that VVL changes can easily break this sample.

General Checklist:

Please ensure the following points are checked:

  • My code follows the coding style
  • I have reviewed file licenses
  • I have commented any added functions (in line with Doxygen)
  • I have commented any code that could be hard to understand
  • My changes do not add any new compiler warnings
  • My changes do not add any new validation layer errors or warnings
  • I have used existing framework/helper functions where possible
  • My changes do not add any regressions
  • I have tested every sample to ensure everything runs correctly
  • This PR describes the scope and expected impact of the changes I am making

Note: The Samples CI runs a number of checks including:

  • I have updated the header Copyright to reflect the current year (CI build will fail if Copyright is out of date)
  • My changes build on Windows, Linux, macOS and Android. Otherwise I have documented any exceptions

If this PR contains framework changes:

  • I did a full batch run using the batch command line argument to make sure all samples still work properly

Sample Checklist

If your PR contains a new or modified sample, these further checks must be carried out in addition to the General Checklist:

  • I have tested the sample on at least one compliant Vulkan implementation
  • If the sample is vendor-specific, I have tagged it appropriately
  • I have stated on what implementation the sample has been tested so that others can test on different implementations and platforms
  • Any dependent assets have been merged and published in downstream modules
  • For new samples, I have added a paragraph with a summary to the appropriate chapter in the readme of the folder that the sample belongs to e.g. api samples readme
  • For new samples, I have added a tutorial README.md file to guide users through what they need to know to implement code using this feature. For example, see conditional_rendering
  • For new samples, I have added a link to the Antora navigation so that the sample will be listed at the Vulkan documentation site

Copy link
Collaborator

@SaschaWillems SaschaWillems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR. I do have some remarks though, mostly related to comment and code structure. I think it's important that people can easily follow understand the changes ;)

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Oct 18, 2024

Thanks @SaschaWillems for the feedback. I am away on vacation this week, but will make the requested changes when I am back.

UPDATE: Back now and changes submitted in 0dc4963.

asuessenbach
asuessenbach previously approved these changes Oct 22, 2024
@SaschaWillems
Copy link
Collaborator

No idea why, but with this PR and the latest SDK (1.3.296) and in windows, this sample is now again running with less than 1 fps. Forcing it to use VK 1.2 is somehow even slower (0 or inf fps).

If I force VK 1.0 performance is fine, but I don't get any debug output.

Not sure what is happening here and why this sample is so problematic. The debug printf sample from m own samples repo works just fine no matter the api version :/

@SRSaunders
Copy link
Contributor Author

No idea why, but with this PR and the latest SDK (1.3.296) and in windows, this sample is now again running with less than 1 fps. Forcing it to use VK 1.2 is somehow even slower (0 or inf fps).

Very strange. Can I ask you to recheck before and after this PR, but being careful with your SDK version selection and project gen/build? I did a lot of testing with old and new SDKs on Windows 10, Linux and macOS before submitting originally. I will go back and test again to see if I can somehow duplicate what you are seeing.

If I force VK 1.0 performance is fine, but I don't get any debug output.

Debug PrintF requires Vulkan 1.1 or later. So no surprise that you are not getting debug output with API 1.0.

The debug printf sample from my own samples repo works just fine no matter the api version

I suspect your repo's sample relies on the instrinsic Debug PrintF capability at the shader level on Windows. However, this is not cross-platform portable. Whereas the Vulkan-Samples one uses the VVL version of the feature all the time. Perhaps that is why you are seeing a difference at least on Windows. Again, I will so back and see if I can verify this.

@SaschaWillems
Copy link
Collaborator

It also happens with the old code (before this PR). I only have SDK 1.3.296 installed.

So probably a regression in the validation layers?

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Oct 23, 2024

Ok, I have rechecked this PR on Windows 10, and even fast-forwarded my local branch to current main HEAD just to make sure. I am using Vulkan SDK 1.3.296.0 with my Radeon RX6600XT GPU. My Vulkan Configurator has been reset to default settings.

Before this PR I get:
main only

After this PR I get:
shader_debugprintf FF

Is it possible that your Vulkan Configurator has a custom setting that is interfering with the sample? Or possibly a difference between AMD and nVidia GPUs? Just grasping at straws since I cannot duplicate your issue and the 1.3.296 VVL seems to be working correctly using API 1.1 for debug printf.

Copy link
Contributor

@asuessenbach asuessenbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, I have to distinguish two cases:

  1. VulkanConfigurator is running
    VK_EXT_LAYER_SETTINGS_EXTENSION_NAME is available
    instance creation is done by VulkanSample::create_instance (line 469)
    render speed is high
    debug_utils_message_callback is never called, thus no debugprintf output
  2. VulkanConfigurator is not running
    VK_EXT_LAYER_SETTINGS_EXTENSION_NAME is not available
    instance creation is done locally (line 523)
    render speed is extremely low
    debug_utils_message_callback is called, with higher rate than the frame rate

Note, in case 2, you're using VkValidationFeaturesEXT, which is part of VK_EXT_VALIDATION_FEATURES_EXTENSION_NAME. But you don't ask for it in the ShaderDebugPrintf constructor (or anywhere else). And in fact, that extension is not supported on my machine. Strange, that the VVL doesn't cry there.

@SaschaWillems
Copy link
Collaborator

That would explain why it's so slow for me. I never ran that sample with the VulkanConfigurator running. That's case 2.

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Oct 24, 2024

Thanks @asuessenbach for pointing out the missing VK_EXT_validation_features extension. I have made a few changes that might make a difference as follows:

  1. Moved layer settings out of the constructor, and into ShaderDebugPrintf::create_instance(). Now it will run only if the VK_EXT_layer_settings extension is available. This part is for encapsulation only and will not change behaviour.
  2. Added and enabled the VK_EXT_validation_features extension when the VK_EXT_layer_settings extension is not available at runtime. This might change behaviour, but I am concerned about @asuessenbach's comment that the extension is not available on his machine. I'm not sure how that is possible.
  3. Fixed an incorrect string comparison operation for VK_EXT_layer_settings in [HPP]Instance::[HPP]Instance(). This was my mistake from an earlier PR. This could have prevented proper specification of the validation layer feature settings when VK_EXT_layer_settings is active. Again, this could change behaviour.

These changes may not be the final solution as I have observed the following when testing:

  1. Linux (Manjaro) using Vulkan 1.3.295 (from pkg mgr) and VVL 1.3.290 (from pkg mgr): this PR works properly (good frame rate, debug data available) when running with vkconfig and without. VK_EXT_layer_settings is only available when vkconfig is active. In this case the debug data is available both in the UI and in the stdout console. No performance issues are visible in either case.
  2. macOS (Ventura) using Vulkan SDK 1.3.296: this PR works properly (good frame rate, debug data available) when running with vkconfig and without. VK_EXT_layer_settings is available both when vkconfig is inactive and active - this is a difference vs Linux. In the latter case (vkconfig active) the debug data is available both in the UI and in the stdout console. No performance issues are present. Also tested with Vulkan SDK 1.3.290 and the results are the same - no performance problems. The only issue is that vkconfig does not appear to recognize the repeated message limit for the new VVL-* messages (vs. the previous INFO-* or WARNING-* messages, etc). A minor issue but likely a bug.
  3. Windows 10 using Vulkan SDK 1.3.296 with my AMD 6600XT GPU: this PR works properly (good frame rate, debug data available) when running without vkconfig only. When vkconfig is active, the sample will not start and complains about an unsupported extension during vkCreateInstance(). However, VK_EXT_layer_settings is available during enumeration when vkconfig is active. Something very strange is going on here - either a bug on the Windows side or something I do not understand. I am not sure how VK_EXT_layer_settings can be enumerated but not supported. See my console output in this case:

nolayerext

In summary:

  1. Linux: works properly using VVL 1.3.290 with and without vkconfig. Can't test VVL 1.3.296 since it is not yet available as a package for my Manjaro distro.
  2. macOS: works properly using VVL 1.3.290 and 1.3.296 with and without vkconfig.
  3. Windows 10 on AMD 6600XT GPU: works properly using VVL 1.3.296 without vkconfig only.

Lastly, I thought VK_EXT_layer_settings was meant to replace and deprecate VK_EXT_validation_features. I don't understand why VK_EXT_layer_settings is available all the time on macOS, but for Windows and Linux seems to be enabled only when vkconfig is running. This seems incorrect to me. Can you explain this?

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Oct 24, 2024

Ok, I think I have finally figured it out. It appears that you don't need to actually enable the VK_EXT_layer_settings extension in order to use it. I’m not sure if this is a feature or a bug. In any case, I have updated the sample and [HPP]Instance::[HPP]Instance() to check for availability of the extension vs. enablement. This approach works across all platforms and behaviours appear to be consistent now:

  1. Sample is tolerant of Vulkan SDK versions: tested against VVL 1.3.290 (Win, Linux, macOS) and 1.3.296 (Win, macOS)
  2. Sample is tolerant of vkconfig running or not running. The only thing to be careful of when running vkconfig is to make sure "Limit Duplicated Messages" is turned off - otherwise debug callback messages will be suppressed and the debug output UI will be blank.

@asuessenbach
Copy link
Contributor

AFAIK, those two extensions (VK_EXT_layer_settings and VK_EXT_validation_features) are not supported by any NVIDIA GPU, but are provided by a layer injected by for example the VulkanConfigurator. That might explain why it's that slow.

Besides that, just to make sure it has been noted: As VK_EXT_validation_features is deprecated in favour of VK_EXT_layer_settings, using VK_EXT_validation_features would just be a fallback solution. Don't know, if it's worth to have that. And you should bail out in a friendly way, if none of those extensions is available, maybe with a hint to the VulkanConfigurator.

@SaschaWillems
Copy link
Collaborator

Welp, still sub 1 fps for me with latest SDK and vkconfig NOT running.

Just let me know when it's in a state were I should test.

If we can't get this to work, we may simply go back to the initial version and maybe remove the debug output and tell people to attach a graphics debugger.

@SRSaunders
Copy link
Contributor Author

Thanks @asuessenbach for the info re nVidia GPUs. I have an AMD card and I guess this is the difference here.

@SaschaWillems would you please test using this PR with vkconfig running and let me know the result? I presume you are using an nVidia GPU - please confirm.

If this works, and as @asuessenbach suggests, I will try to detect this condition and offer a message to nVidia users.

@SaschaWillems
Copy link
Collaborator

If this works, and as @asuessenbach suggests, I will try to detect this condition and offer a message to nVidia users.

If we get to a point where we have to show a message under certain conditions to users of a certain vendor we're not heading where I'd like our samples to head. I'd rather remove the output debug stuff then.

@SRSaunders
Copy link
Contributor Author

@SaschaWillems I understand. However I’d still like to track this down if possible and you testing on Nvidia with vkconfig active would give more information. I can’t do this test myself. Thx.

@SaschaWillems
Copy link
Collaborator

SaschaWillems commented Oct 24, 2024

Windows 11 23H2, nvidia RTX 4070, latest Vulkan developer driver, SDK 1.3.296.

And I get <1 fps even with vkconfig up and running:

image

I'm pretty sure that the sample ran fine when I initially wrote it, but not sure why it no longer does.

Can't rule out a configuration issue on my side 100%, but not sure where to start looking.

@SRSaunders
Copy link
Contributor Author

I just added a minor hygiene change to use vk::ExtensionProperties vs. VkExtensionProperties in HPPInstance(). Also updated some comments and decided to explicitly request required GPU features for debugPrintfEXT as per docs.

More importantly, I was able to find an nVidia GPU to test this. I have narrowed down what causes the slowdown and am now convinced it is a VVL debugPrintfEXT defect on that GPU platform. Simply by disabling the following debugPrintfEXT feature enablement lines I can restore FPS performance on nVidia machines for both vkconfig running and not running cases. Unfortunately this drops the debug info, but hopefully this is a temporary thing until this issue can be addressed.

...
	//add_layer_setting(layerSetting);
...
	instance_create_info.pNext = nullptr; //&validation_features;
...

I will respond on the other thread to @spencer-lunarg to see if he can help.

@spencer-lunarg
Copy link
Contributor

@SRSaunders before we had the Slow Down on for Vulkan 1.1 and 1.2/1.3 were good... is that still the case or is it now for all versions?

@SRSaunders
Copy link
Contributor Author

SRSaunders commented Oct 25, 2024

When using an nVidia GPU with SDK 1.3.296, it slows down for all API versions. When using SDK 1.3.290 with the same setup (nVidia GPU), the sample works properly when using API 1.2 - as expected per previous discussion.

For AMD GPUs (and Apple Silicon on macOS) with SDK 1.3.296 everything works properly when using API 1.1

@spencer-lunarg
Copy link
Contributor

ok, so the problem has be isolated down to an NVIDIA GPU (I was testing on Intel and found no issues)... Later tonight I will be back at my desk and can try again on my NVIDIA machine

@CLAassistant
Copy link

CLAassistant commented Feb 7, 2025

CLA assistant check
All committers have signed the CLA.

SaschaWillems
SaschaWillems previously approved these changes Feb 9, 2025
asuessenbach
asuessenbach previously approved these changes Feb 10, 2025
Copy link
Contributor

@gary-sweet gary-sweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also seeing a couple of validation errors:

[warning] 507847663 - DEBUG-PRINTF-FORMATTING: Validation Warning: [ DEBUG-PRINTF-FORMATTING ] | MessageID = 0x1e4523ef | OpString "Position = %v4f" contains a 4-wide vector modifier "%v4f", but the argument (SPIR-V Id 157) is a 3-wide vector (values might be truncated or padded)
[warning] 507847663 - DEBUG-PRINTF-FORMATTING: Validation Warning: [ DEBUG-PRINTF-FORMATTING ] | MessageID = 0x1e4523ef | OpString "Position = %v4f" contains a 4-wide vector modifier "%v4f", but the argument (SPIR-V Id 157) is a 3-wide vector (values might be truncated or padded)

These are easily fixed by changing the last line in scene.vert to say debugPrintfEXT("Position = %v3f", outPos)

@SRSaunders
Copy link
Contributor Author

Unfortunately the recent framework merge of da8ee80 broke this PR, so that's why it's sitting in conflict resolution mode now. That's the danger of having a PR sit for 4+ months awaiting merge - something's going to break in between!

@asuessenbach before I fast-forward this branch, can you tell me why you added:
add_instance_layer("VK_LAYER_KHRONOS_shader_object");

Shouldn't it be this instead?
add_instance_layer("VK_LAYER_KHRONOS_validation");

@asuessenbach
Copy link
Contributor

asuessenbach commented Feb 25, 2025

@SRSaunders You're right, it should be
add_instance_layer("VK_LAYER_KHRONOS_validation");

Don't know how that could have been scrambled.

@SRSaunders
Copy link
Contributor Author

@asuessenbach thanks for your quick answer. I will update that when I fast forward the branch.

Conflicts:
	samples/extensions/shader_debugprintf/shader_debugprintf.cpp
gary-sweet
gary-sweet previously approved these changes Feb 25, 2025
Copy link
Contributor

@gary-sweet gary-sweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine for me now, thanks.

@SaschaWillems SaschaWillems self-requested a review February 25, 2025 15:44
SaschaWillems
SaschaWillems previously approved these changes Feb 25, 2025
@SaschaWillems
Copy link
Collaborator

This might need a rebase, as android build fails. Most probably due to the recent update of HCWPipe?

@SRSaunders
Copy link
Contributor Author

@SaschaWillems yes you are right. I will fix in a few hours - have to step away for a bit. Thanks for your help.

@asuessenbach
Copy link
Contributor

Runs fine on Win10 with NVIDIA GPU. I just get a couple of VVL warnings (not errors!):

[warning] 2132353751 - VALIDATION-SETTINGS: Validation Warning: [ VALIDATION-SETTINGS ] | MessageID = 0x7f1922d7 | DebugPrintf logs to the Information message severity, enabling Information level logging otherwise the message will not be seen.
[warning] 2132353751 - VALIDATION-SETTINGS: Validation Warning: [ VALIDATION-SETTINGS ] | MessageID = 0x7f1922d7 | DebugPrintf logs can possibly print many times, but duplicate_message_limit is set to 0, setting enable_message_limit to false so all logs are printed.
[warning] 1985515673 - WARNING-DEBUG-PRINTF: Validation Warning: [ WARNING-DEBUG-PRINTF ] | MessageID = 0x76589099 | vkCreateDevice(): Internal Warning: Forcing shaderInt64 to VK_TRUE
[warning] 1985515673 - WARNING-DEBUG-PRINTF: Validation Warning: [ WARNING-DEBUG-PRINTF ] | MessageID = 0x76589099 | vkCreateDevice(): Internal Warning: Adding a VkPhysicalDeviceTimelineSemaphoreFeatures to pNext with timelineSemaphore set to VK_TRUE
[warning] 1985515673 - WARNING-DEBUG-PRINTF: Validation Warning: [ WARNING-DEBUG-PRINTF ] | MessageID = 0x76589099 | vkCreateDevice(): Internal Warning: Adding a VkPhysicalDeviceVulkanMemoryModelFeatures to pNext with vulkanMemoryModel and vulkanMemoryModelDeviceScope set to VK_TRUE
[warning] 1985515673 - WARNING-DEBUG-PRINTF: Validation Warning: [ WARNING-DEBUG-PRINTF ] | MessageID = 0x76589099 | vkCreateDevice(): Internal Warning: Adding a VkPhysicalDeviceBufferDeviceAddressFeatures to pNext with bufferDeviceAddress set to VK_TRUE

Don't know if we want them to be resolved.

@spencer-lunarg
Copy link
Contributor

Don't know if we want them to be resolved.

No, these are not errors, this is just Validation Layers being a good layer and letting you know it is adjusting settings and turning on/off features for you, so you don't get confused what is happening underneath

@SRSaunders SRSaunders dismissed stale reviews from SaschaWillems and gary-sweet via 839291f February 25, 2025 19:03
@SRSaunders
Copy link
Contributor Author

SRSaunders commented Feb 25, 2025

@marty-johnson59 I would like to ask for you to merge this ahead of #1281. Otherwise we will have to go through another round of fixups and approvals. There has been enough framework churn underneath this PR and it's time to close it off.

I have kindly asked @asuessenbach if he could please adjust #1281 once this is merged. Thanks.

Approvers are likely: @SaschaWillems, @asuessenbach, and @gary-sweet.

@marty-johnson59
Copy link
Contributor

OK, I'll go ahead and merge this now - LMK if it causes problems and we can revert

@marty-johnson59 marty-johnson59 merged commit 43c23e5 into KhronosGroup:main Feb 25, 2025
19 checks passed
@SRSaunders
Copy link
Contributor Author

Thanks @marty-johnson59. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shader_debugprintf problems with new VulkanSDK 1.3.296
7 participants