Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, fix request_redraw() related panics #1461

Merged
merged 8 commits into from
Mar 7, 2020

Conversation

filnet
Copy link
Contributor

@filnet filnet commented Feb 13, 2020

Simplifies the message loops and event handling considerably
Strongly inspired from the X11 platform

Partially reverts 6a330a2

These panics were introduced by 6a330a2

Fixes #1391
Fixes #1400
Fixes #1466
Fixes #1469
Probably fixes other related issues

The commit mentioned above introduced a few critical regressions.
I did quite a bit of forensic (looking at 0.19.1, 0.20.0-alpha1) to understand what was going on around the handling of request_redraw(). I also looked at how other platforms handle it.

I tried really hard to fix the issues using the current approach of posting WM_PAINT messages but got nowhere. Too many corner cases.
In the end I decided to go for the X11 approach of using a HashSet to keep track of windows requiring a redraw and things got much simpler.

This PR does not fundamentally change how request_redraw() is handled.
It still calls RedrawWindow which sends a WM_PAINT message. Which means that:

  • multiple WM_PAINT messages are coalesced (for example when calling request_redraw while resizing or calling request_redraw multiple times),
  • RedrawRequested events are always sent when handling a WM_PAINT message.

I tried to be very thorough in testing:

  • tested all control flow modes (Poll, Wait and WaitUntill) in both the non-modal and modal loops
  • made sure that redraw requests are properly coalesced (in particular when resizing)
  • tested multi window and fullscreen

There are a few untested corner cases. See #1429 for more details.

And always kept the specs in a corner of my mind:

@filnet filnet force-pushed the windows_request_redraw branch from 521c2d0 to 1617ad7 Compare February 13, 2020 11:09
@filnet filnet force-pushed the windows_request_redraw branch from 1617ad7 to de54072 Compare February 13, 2020 15:47
@goddessfreya goddessfreya requested a review from Osspial February 13, 2020 19:46
@goddessfreya goddessfreya added DS - windows C - waiting on maintainer A maintainer must review this code labels Feb 13, 2020
@filnet filnet force-pushed the windows_request_redraw branch 2 times, most recently from 3bb98f3 to b16ff93 Compare February 14, 2020 17:12
@filnet filnet force-pushed the windows_request_redraw branch from b16ff93 to dc354c2 Compare February 25, 2020 13:19
@filnet filnet changed the title on Windows, use a pending redraws set to keep track of request_redraw On Windows, fix request_redraw() related panics Feb 25, 2020
@filnet filnet force-pushed the windows_request_redraw branch 4 times, most recently from 37141ef to 6659f8f Compare February 26, 2020 13:17
@filnet
Copy link
Contributor Author

filnet commented Feb 26, 2020

I think I am done for now. All reported panics are not reproducible anymore.

@Imberflur
Copy link
Contributor

I'm guessing it probably also fixes #1469

@Imberflur
Copy link
Contributor

Imberflur commented Feb 27, 2020

Windows user experienced some panics.

PanicInfo: panicked at 'non-redraw event in redraw phase: Some(WindowEvent { window_id: WindowId(WindowId(0x1c009e)), event: Focused(false) })', <::std::macros::panic macros>:5:6

   8: std::panicking::begin_panic_fmt
             at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\/src\libstd\panicking.rs:332
   9: winit::platform_impl::platform::event_loop::runner::EventLoopRunner<()>::process_event<()>
             at <::std::macros::panic macros>:5
  10: winit::platform_impl::platform::event_loop::runner::ELRShared<()>::send_event_unbuffered
             at src\platform_impl\windows\event_loop\runner.rs:121
  11: winit::platform_impl::platform::event_loop::runner::ELRShared<()>::send_event<()>
             at src\platform_impl\windows\event_loop\runner.rs:106
  12: winit::platform_impl::platform::event_loop::public_window_callback<()>
             at src\platform_impl\windows\event_loop.rs:0
  13: DefSubclassProc
  14: DefSubclassProc
  15: CallWindowProcW
  16: CallWindowProcW
  17: glPushClientAttrib
  18: CallWindowProcW
  19: DispatchMessageW
  20: IsWindowVisible
  21: KiUserCallbackDispatcher
  22: NtUserPeekMessage
  23: PeekMessageW
  24: PeekMessageW
  25: winit::platform_impl::platform::event_loop::EventLoop<()>::run_return<(),closure-1>
             at src\platform_impl\windows\event_loop.rs:250
  26: winit::platform_impl::platform::event_loop::EventLoop<()>::run<(),closure-1>
             at src\platform_impl\windows\event_loop.rs:192
  27: winit::event_loop::EventLoop<()>::run<(),closure-1>
             at src\event_loop.rs:148
  28: veloren_voxygen::run::run
             at voxygen\src\run.rs:18
  29: veloren_voxygen::main
             at voxygen\src\main.rs:257

and

PanicInfo: panicked at 'non-redraw event in redraw phase: Some(WindowEvent { window_id: WindowId(WindowId(0x60bd2)), event: Moved(PhysicalPosition { x: 4294967289, y: 0 }) })', <::std::macros::panic macros>:5:6

   8: std::panicking::begin_panic_fmt
             at /rustc/58b834344fc7b9185e7a50db1ff24e5eb07dae5e\/src\libstd\panicking.rs:332
   9: winit::platform_impl::platform::event_loop::runner::EventLoopRunner<()>::process_event<()>
             at <::std::macros::panic macros>:5
  10: winit::platform_impl::platform::event_loop::runner::ELRShared<()>::send_event_unbuffered
             at src\platform_impl\windows\event_loop\runner.rs:121
  11: winit::platform_impl::platform::event_loop::runner::ELRShared<()>::send_event<()>
             at src\platform_impl\windows\event_loop\runner.rs:106
  12: winit::platform_impl::platform::event_loop::SubclassInput<()>::send_event
             at src\platform_impl\windows\event_loop.rs:104
  13: winit::platform_impl::platform::event_loop::public_window_callback<()>
             at src\platform_impl\windows\event_loop.rs:728
  14: DefSubclassProc
  15: DefSubclassProc
  16: CallWindowProcW
  17: CallWindowProcW
  18: glPushClientAttrib
  19: CallWindowProcW
  20: DispatchMessageW
  21: MBToWCSEx
  22: KiUserCallbackDispatcher
  23: NtUserPeekMessage
  24: PeekMessageW
  25: PeekMessageW
  26: winit::platform_impl::platform::event_loop::EventLoop<()>::run_return<(),closure-1>
             at src\platform_impl\windows\event_loop.rs:250
  27: winit::platform_impl::platform::event_loop::EventLoop<()>::run<(),closure-1>
             at src\platform_impl\windows\event_loop.rs:192
  28: winit::event_loop::EventLoop<()>::run<(),closure-1>
             at src\event_loop.rs:148
  29: veloren_voxygen::run::run
             at voxygen\src\run.rs:18
  30: veloren_voxygen::main
             at voxygen\src\main.rs:257

edit: updated with debug info versions

@filnet
Copy link
Contributor Author

filnet commented Feb 27, 2020

@Imberflur what was the user doing and which Windows version was used ?

I think I understand the issue but can't reproduce it. See #1429 (comment).

And where can I find the source code to take a look ?

Edited: Might be triggered by something done while handling MainEventsCleared in the user event loop.

@Imberflur
Copy link
Contributor

Imberflur commented Feb 27, 2020

Here is the event loop code: https://gitlab.com/veloren/veloren/-/blob/zesterer/winit-20/voxygen/src/run.rs
Maybe it is easy to trigger because it lingers in the MainEventsCleared handler?
@filnet
Edit: The first one occurs when switching to a different window. I think the other one randomly happens.

@filnet
Copy link
Contributor Author

filnet commented Feb 27, 2020

I was able to get a panic by sleeping when handling MainEventsClear.

@Imberflur your first panic has this strange line:

12: winit::platform_impl::platform::event_loop::public_window_callback<()>
at src\platform_impl\windows\event_loop.rs:0

Line 0 ???

@filnet
Copy link
Contributor Author

filnet commented Feb 27, 2020

@Imberflur not sure it is going to address your issues but in your game loop, you should:
1/ call window.request_redraw() from MainEventsCleared if a redraw is needed
2/ move your rendering code to RedrawRequested.
This is the 'correct' way to write a winit event loop and will change the place/time where you sleep (might workaround the issue).

@Imberflur
Copy link
Contributor

@filnet I avoided using the "correct" method for redrawing because it kept the code simpler/closer to the original structure and I don't have the need currently to respond to external redraw requests. Nevertheless, I doubt it would circumvent the issue since time would still be spent in MainEventsCleared doing update logic.

@filnet filnet force-pushed the windows_request_redraw branch from 19e2307 to 801f11d Compare February 28, 2020 09:49
@filnet
Copy link
Contributor Author

filnet commented Feb 28, 2020

I just pushed a small fix that should address some of the panics.

@filnet
Copy link
Contributor Author

filnet commented Mar 4, 2020

I am more or less done now. But there are two aspects that I am not too happy about.

1/ I added a few panic calls to catch some undefined use cases.
Those involve calling window functions that change the window state (like set_inner_size) when handling MainEventsCleared, RedrawRequested or RedrawEventsCleared. Are those allowed and, if yes, how should they behave ?

Edit: likewise, calling window.request_redraw() when handling events other than MainEventsClear is not well defined. The doc does not say much. And the part about callling window.request_redraw() when handling RedrawRequested is confusing. On Windows, calling window.request_redraw() when handling RedrawRequested will be swallowed.

2/ There is a handful of calls to winuser::SetWindowPos that were all immediately followed by calls to winuser::RedrawWindow. I replaced all winuser::RedrawWindow calls by winuser::InvalidateRgn.
I think the calls to winuser::InvalidateRgn can be removed.
Does anyone know why those winuser::RedrawWindow calls were added in the first place? Reading the winuser::SetWindowPos documentation leads me to beleive that this method triggers redraws when appropriate.

@aloucks
Copy link
Contributor

aloucks commented Mar 5, 2020

I added a few panic calls to catch some undefined use cases. Those involve calling window functions that change the window state (like set_inner_size) when handling MainEventsCleared, RedrawRequested or RedrawEventsCleared. Are those allowed and, if yes, how should they behave ?

I believe the desired usage is that you enter the eventloop and it takes over as your main. There's an event loop extension trait that let's you break out, but I don't think this is the preferred way.

With that said, I'd imagine that the expectation for state change operations like set_inner_size is that they should not panic and should be honored.

@filnet
Copy link
Contributor Author

filnet commented Mar 5, 2020

With that said, I'd imagine that the expectation for state change operations like set_inner_size is that they should not panic and should be honored.

Yep and that's the whole point. Because of how winit does things on Windows handling set_inner_size calls (and other similar winit::Windows functions) that originate from user code while handling MainEventsCleared is broken and hard to fix.

There is a sort of catch 22 situation. Calling set_inner_size will immediately emit a WM_SIZE and WM_PAINT message. The WM_SIZE message will generate an Event::Resized event that will get buffered. It is buffered because you can't dispatch an event while another one is being dispatched (in our case we are currently dispatching MainEventsCleared). On the other hand, for WM_PAINT, you can't buffer the RedrawRequested event because apparently, on Windows, you should (must?) redraw while handling the WM_PAINT message.
A fix would be to defer the "whole" set_inner_size call to after dispatching the RedrawEventsCleared (i.e. when the current loop iteration is finished) but I don't know how to do that. Edit: that fix is questionable...

Anyways, if I change the panic call that is done when detecting this situation to warn calls then things sort of work. The call is honored (i.e. the window is resized) but the user sees strange sequences of events.

@filnet
Copy link
Contributor Author

filnet commented Mar 5, 2020

To summarize the situation:

winit forwards native UI events and also injects a few synthetic events (NewEvents, MainEventsCleared, RedrawEventsCleared).

The synthetic events act as some sort of checkpoints.
For instance MainEventsCleared tells the user code that all current events have been dispatched and that it should get ready to redraw. If the user code calls code that triggers new events then we are in a bad situation. Likewise, changing the window state while rendering in not a good thing.

Unfortunately, winit cannot forbid user code from changing the window state from MainEventsCleared or RedrawRequested. The question, then, is should we fail badly when it happens or should we just emit warnings and break softly.

The doc should be updated to mention that some use cases are discouraged and have undefined behaviors.

Edit: https://docs.rs/winit/0.21.0/winit/event/enum.Event.html

@Osspial
Copy link
Contributor

Osspial commented Mar 6, 2020

A fix would be to defer the "whole" set_inner_size call to after dispatching the RedrawEventsCleared (i.e. when the current loop iteration is finished) but I don't know how to do that. Edit: that fix is questionable...

We could change set_inner_size to post a custom message to the event queue, then perform the resize logic in that event handler. We'd have to update the documentation to reflect that, though.

EDIT: the UpdateWindow->InvalidateRgn change doesn't fix that issue?

@filnet
Copy link
Contributor Author

filnet commented Mar 6, 2020

We could change set_inner_size to post a custom message to the event queue, then perform the resize logic in that event handler. We'd have to update the documentation to reflect that, though.

EDIT: the UpdateWindow->InvalidateRgn change doesn't fix that issue?

set_inner_size is just one of the many state changing function exposed by Window and no the UpdateWindow->InvalidateRgn does not fix the issue.

There is no proper fix in my opinion. The doc tells you what you're supposed to do when handling MainEventsCleared.

MainEventsCleared
[−]

Emitted when all of the event loop's input events have been processed and redraw processing is about to begin.

This event is useful as a place to put your code that should be run after all state-changing events have been handled and you want to do stuff (updating state, performing calculations, etc) that happens as the "main body" of your event loop. If your program draws graphics, it's usually better to do it in response to Event::RedrawRequested, which gets emitted immediately after this event.

To me, it is implied that doing anything else will result in undefined behavior. And the "updating state" refers to your "world state", not the window state.

The problem is that the user code has full access to the window. Ideally it should have access to a mutable window when handling regular events and to a non mutable events when handling MainEventsCleared and RedrawRequested. But that is currently not possible.

An extreme example: In RedrawRequested a user could change the window size in the middle of rendering a complex scene. No one would do that because it is crazy but, yet, it is currently possible to do that.

This quite long video mentions this issue (in another but similar context) : https://www.youtube.com/watch?v=4YTfxresvS8

@filnet
Copy link
Contributor Author

filnet commented Mar 6, 2020

I think what is allowed to be done in MainEventsCleared is not specific to Windows. We should discuss it with the maintainers of the other platforms.

Also note that all winit's example call state changing function (set_fullscreen, etc) when handling keyboard events. They never do it from MainEventsCleared.
And a user would expect the set_inner_size() call done from MainEventsCleared to be effective immediately in the following RedrawRequested. But how do we send the Resized event. We just can't because we are in MainEventsCleared state...

Now the real question, since we can't prevent it, is how to handle it : crash, warn or fumble.
And, again, how do the other platforms handle it ?

Copy link
Contributor

@Osspial Osspial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple code quality concerns I have with this, but functionality-wise this looks good. Thank you for investigating these problems!

@Osspial
Copy link
Contributor

Osspial commented Mar 6, 2020

@filnet My stance is that calling any window state-changing function during MainEventsCleared should be allowable behavior - if the documentation doesn't say that a particular usage pattern is explicitly wrong, we should do our best to handle it gracefully. I think it's perfectly reasonable for us to add a clause to the size-changing functions that specifies that the size change doesn't take effect until the Resized event is received.

@filnet
Copy link
Contributor Author

filnet commented Mar 6, 2020

@Osspial I'll try to put this PR in perspective.

I came to winit as I decided to learn Rust and port a pet project (a Voxel engine in Monogame) to it.
I quickly found winit and started to use it to write a game loop. I started with version 0.20.0 and then 0.21.0.

I then fell into the rabbit hole of trying to fix the issues introduced in this major release.
I tried a few "new" approaches and went full circle to in the end just tighten your code.
I believe your current architecture can be made to work and, I hope, this PR proves it.
I might have over tightened the code in a few places (i.e. made more obscure) ;)

AFAIK, the two major issues remaining, in 0.21.0 and this PR, are:
1/ changing window state in MainEventsCleared, etc... as discussed.
2/ the event loop can be frozen from the title bar (right clicking the title bar, clicking the window icon, ...)

filnet added 8 commits March 6, 2020 23:04
This avoids directly sending a WM_PAINT message,
which might cause buffering of RedrawRequested events.

We don't want to buffer RedrawRequested events because:
- we wan't to handle RedrawRequested during processing of WM_PAINT messages
- state transitionning is broken when handling buffered RedrawRequested events

Fixes rust-windowing#1469
This fixes a panic.
Note that the WM_PAINT event is now sent to the modal_redraw_method
which is more correct and avoids an unecessary redraw of the window.

Relates to but does does not fix rust-windowing#1484
This seems to prevent PeekMessage from dispatching unrelated sent messages
This makes the code less panicky...

And actually, winit's Windoww callbacks should not panic
because the panic will unwind into Windows code.

It is currently undefined behavior to unwind from Rust code into foreign code.
See https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
@filnet filnet force-pushed the windows_request_redraw branch from e145f74 to f346b43 Compare March 6, 2020 22:44
@Osspial
Copy link
Contributor

Osspial commented Mar 7, 2020

Alright, this looks good. Thanks for the work on this! I'll go ahead and merge this now, since it seems to be a bit more stable than #1496 is at present.

That being said, I still intend to merge #1496 eventually, once the issues with it have been addressed. There are a good number of concrete technical reasons to go to the additional lengths that PR does, but it'll be better to explain those in that particular thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment