Set a fixed number of worker threads in debug mode to make race conditions more reproducible #72577

myaaaaaaaaa · 2023-02-02T08:48:28Z

Makes multithreading behavior more consistent, see here for an example of an issue that was made more difficult to reproduce due to differing behavior between machines with different core counts.

Having a higher number of threads by default also makes race condition bugs easier to detect by normal users at almost no overhead, without having to resort to compiling a custom build with ThreadSanitizer enabled. This is especially helpful on low core count systems where such bugs would come up much less often with the default behavior, as race conditions only manifest more often with an increased number of threads.

Note that this does not affect any decisions on when to take single vs multithreaded paths and that the number of worker threads can still be overridden with threading/worker_pool/max_threads when necessary.

Also note that this has been edited to incorporate feedback and address concerns, so most points raised below are outdated.

RandomShaper · 2023-02-02T08:58:31Z

I'm not sure this is the right way to cause "multi-threading stress."

The idea may have merit, but enabling something like that by default, even if dev-only, can also hinder debugging when one is interested in the default behavior, which is the case most of the time.

I'd suggest, in case there's consensus among core & rendering developers (so maybe a proposal about this should be opened) that there's a new build setting that enables a define (say, MT_STRESS_ENABLED) that tweaks defaults in such a way. and even maybe including other areas of the engine. Or maybe a new command-line switch that, again, is only available in dev builds, that lets you change the setting without rebuilding. Now I think of it, a setting that does the very opposite (restricting the worker to one thread) may also make sense. At least, I've found myself doing so manually often times while debugging some piece of code that otherwise would be running 16 times simultaneously.

akien-mga · 2023-02-02T09:07:46Z

Yeah I agree, there's a similar change in https://github.com/godotengine/godot/pull/72346/files#diff-f34c21fb879df1303b79da30fb0cbb9801e17cb0dbbb9e429249603e50d60603R102 which is also only intended for stress testing of the implementation, and shouldn't force all contributors to pay that price when working on unrelated areas.

Usually for this kind of stuff we have custom defines in the file which can be enabled manually (e.g. //#define STRESS_TEST, uncomment before compiling to run that logic).

If that's too cumbersome / not applicable for the purpose here (automated benchmarking and regression testing?), then we could indeed have another SCons compile option that enables one define common to all these automated testing code branches.

clayjohn · 2023-02-02T16:43:28Z

Additionally, this is just trading one issue for another. This code would make it so that only the multithreaded codepath would be tested and the single threaded codepath would almost never be run.

myaaaaaaaaa · 2023-02-03T01:30:06Z

For context, the following code block from servers/rendering/renderer_scene_cull.cpp illustrates how WorkerThreadPool is used in Godot in practice:

void RendererSceneCull::_scene_cull_threaded(uint32_t p_thread, CullData *cull_data) {
	uint32_t cull_total = cull_data->scenario->instance_data.size();
	uint32_t total_threads = WorkerThreadPool::get_singleton()->get_thread_count();
	uint32_t cull_from = p_thread * cull_total / total_threads;
	uint32_t cull_to = (p_thread + 1 == total_threads) ? cull_total : ((p_thread + 1) * cull_total / total_threads);

	_scene_cull(*cull_data, scene_cull_result_threads[p_thread], cull_from, cull_to);
}

...

	if (cull_to > thread_cull_threshold) {
		//multiple threads
		for (InstanceCullResult &thread : scene_cull_result_threads) {
			thread.clear();
		}

		WorkerThreadPool::GroupID group_task = WorkerThreadPool::get_singleton()->add_template_group_task(this, &RendererSceneCull::_scene_cull_threaded, &cull_data, scene_cull_result_threads.size(), -1, true, SNAME("RenderCullInstances"));
		WorkerThreadPool::get_singleton()->wait_for_group_task_completion(group_task);

		for (InstanceCullResult &thread : scene_cull_result_threads) {
			scene_cull_result.append_from(thread);
		}

	} else {
		//single threaded
		_scene_cull(cull_data, scene_cull_result, cull_from, cull_to);
	}

Essentially, a heuristic determines whether to just use the single-threaded codepath, or to use WorkerThreadPool to spawn multiple threads that each call the single-threaded codepath with a different subset of the workload. Of course, it would be best if WorkerThreadPool had a function that could handle this automatically,¹ but that's a separate issue.

Right now, WorkerThreadPool is only ever run when the multithreaded codepath is desired, and defaults to the number of threads on the system. The idea with increasing this by default is so that any bugs caused by multithreading would be easier for normal users to find and reproduce,² since multithreading bugs have a higher likelihood of popping up as the number of threads increases.³

Usually for this kind of stuff we have custom defines in the file which can be enabled manually (e.g. //#define STRESS_TEST, uncomment before compiling to run that logic).

In my testing, the overhead of context switching due to thread oversubscription appears to be minimal. I'm seeing 5.3% slowdown from running 64 threads on my 2c4t computer compared to 4 threads, see the below average frame time in milliseconds:

--- /dev/fd/63	2023-02-02 19:45:13.387513929 -0500
+++ /dev/fd/62	2023-02-02 19:45:13.390847272 -0500
@@ -79,13 +79,13 @@
       "category": "Rendering > Lights And Meshes",
       "name": "Omni 100",
       "results": {
         "idle": 0,
         "physics": 0,
-        "render_cpu": 107.37,
-        "render_gpu": 17.14,
+        "render_cpu": 113.12,
+        "render_gpu": 17.67,
       }
     },
     {
       "category": "Rendering > Lights And Meshes",
       "name": "Speed Fast",

Given the reproducibility benefits of having this enabled by default for everyone, I'd argue that 5.3% is a very small price to pay.

Now I think of it, a setting that does the very opposite (restricting the worker to one thread) may also make sense. At least, I've found myself doing so manually often times while debugging some piece of code that otherwise would be running 16 times simultaneously.

I've changed the implementation so that threading/worker_pool/max_threads takes priority if it's set, does this work for you?

Ideally with a better way of determining whether to use the single-threaded or multithreaded codepath than adding a project setting every time WorkerThreadPool is used, along with a project setting to force single/multiple threads always ↩
In other words, it should actually be enabled in all debug builds rather than just developer builds, I've updated the PR to reflect this ↩
As an example, the issues Vulkan: Random meshes vanish in large scenes with lights that have a projector texture #68274 and Vulkan: Models/surfaces randomly stop rendering if too many lights are present #70406 turned from visual glitches to outright crashes for me after this patch was enabled ↩

myaaaaaaaaa · 2023-02-03T21:43:42Z

Or maybe a new command-line switch that, again, is only available in dev builds, that lets you change the setting without rebuilding. Now I think of it, a setting that does the very opposite (restricting the worker to one thread) may also make sense. At least, I've found myself doing so manually often times while debugging some piece of code that otherwise would be running 16 times simultaneously.

@RandomShaper See #72689

myaaaaaaaaa · 2023-03-02T18:32:22Z

After some more testing and deliberation, I've reduced the number of default worker threads from 64 to 16 to be more in line with typical systems as according to Steam's Hardware Survey (80% of users fall between 4c8t to 8c16t).

The theory being that if a performance drop is observed in a piece of multithreaded code, it probably has limited scalability, and 16 is probably a reasonable standard to use for testing that.

…tions more reproducible

myaaaaaaaaa requested review from a team as code owners February 2, 2023 08:48

RandomShaper added this to the 4.0 milestone Feb 2, 2023

RandomShaper added enhancement topic:core topic:rendering labels Feb 2, 2023

clayjohn removed this from the 4.0 milestone Feb 2, 2023

Chaosus added this to the 4.x milestone Feb 3, 2023

myaaaaaaaaa changed the title ~~Increase number of worker threads in developer mode to aid debugging~~ Increase default number of worker threads in debug mode to make race conditions more reproducible Feb 8, 2023

myaaaaaaaaa changed the title ~~Increase default number of worker threads in debug mode to make race conditions more reproducible~~ Set a fixed number of worker threads in debug mode to make race conditions more reproducible Mar 2, 2023

myaaaaaaaaa mentioned this pull request May 12, 2023

Fix multiple issues in WorkerThreadPool #76945

Merged

myaaaaaaaaa mentioned this pull request Jun 9, 2023

Decrease overhead of WorkerThreadPool task processing #72716

Closed

Set a fixed number of worker threads in debug mode to make race condi…

91da6fb

…tions more reproducible

myaaaaaaaaa closed this Aug 21, 2023

myaaaaaaaaa deleted the task-splitter branch August 21, 2023 15:11

akien-mga added the archived label Aug 21, 2023

AThousandShips removed this from the 4.x milestone Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set a fixed number of worker threads in debug mode to make race conditions more reproducible #72577

Set a fixed number of worker threads in debug mode to make race conditions more reproducible #72577

myaaaaaaaaa commented Feb 2, 2023 •

edited

Loading

RandomShaper commented Feb 2, 2023

akien-mga commented Feb 2, 2023

clayjohn commented Feb 2, 2023

myaaaaaaaaa commented Feb 3, 2023 •

edited

Loading

myaaaaaaaaa commented Feb 3, 2023

myaaaaaaaaa commented Mar 2, 2023 •

edited

Loading

Set a fixed number of worker threads in debug mode to make race conditions more reproducible #72577

Set a fixed number of worker threads in debug mode to make race conditions more reproducible #72577

Conversation

myaaaaaaaaa commented Feb 2, 2023 • edited Loading

RandomShaper commented Feb 2, 2023

akien-mga commented Feb 2, 2023

clayjohn commented Feb 2, 2023

myaaaaaaaaa commented Feb 3, 2023 • edited Loading

Footnotes

myaaaaaaaaa commented Feb 3, 2023

myaaaaaaaaa commented Mar 2, 2023 • edited Loading

myaaaaaaaaa commented Feb 2, 2023 •

edited

Loading

myaaaaaaaaa commented Feb 3, 2023 •

edited

Loading

myaaaaaaaaa commented Mar 2, 2023 •

edited

Loading