Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Crystal::FiberChannel #14949

Closed

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Aug 28, 2024

Fixes #14222. This doesn't affect IO.pipe's behavior on Windows.

@HertzDevil HertzDevil added topic:stdlib:concurrency kind:refactor topic:multithreading kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Aug 28, 2024
Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fiber.yield enqueues a 0 second timer, so the next event loop wait won't block. Don't we enter a busy loop when there's nothing to do but wait on the event loop?

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Aug 30, 2024

I almost managed to replace it with Fiber.suspend / reschedule on Windows, but then the same code no longer works on WSL. I think this is because fiber_channel.receive implicitly tells Libevent to monitor the pipe, without which Crystal::LibEvent::EventLoop#run immediately returns, so the worker threads start busy-waiting.

I believe the use of IO.pipe is actually a Libevent-exclusive implementation detail, and it is fine if Windows does something else entirely.

EDIT: Nevermind, it looks like Crystal::EventLoop#interrupt works?

@ysbaddaden
Copy link
Contributor

Handling the parked/sleeping state is very tricky, especially in MT.

The IO pipe enables a double trick: it allows the run loop to enter the parked/sleeping state, while making sure that there's always something in the event loop.

For example libevent won't block if there are no registered event/timer (nothing to wait for) so with this change the thread will enter a busy-loop again when it happens —and won't our IOCP evloop do the same?

Note: the IO pipe would also work for Windows/IOCP if reading from the reader was nonblocking.

Maybe we can find an alternative to FiberChannel on Windows, and keep it otherwise?

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Sep 2, 2024

In practice, the fiber stack pool reaper should ensure every thread's scheduler always has something to await since #14100, something not true when we started using IO.pipe years ago. If not then maybe we can pretend there is a sleep with an extremely long duration?

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Sep 4, 2024

Actually the point about IOCP is already somewhat true even without -Dpreview_mt; if you manually comment out the interrupt handler loop and the stack pool reaper, things like Process.wait will no longer work, because the async implementation in #13908 never enqueues anything to the scheduler, not even a placeholder event. (Actually this happens with interpreted code on Windows as well.) I think it makes sense to do it?

@HertzDevil
Copy link
Contributor Author

It looks like overlapped I/O operations don't always use events either; if they have no timeouts, then Crystal::IOCP::OverlappedOperation#wait_for_completion calls Fiber.suspend, in which case Crystal::IOCP::EventLoop#@queue only ever takes care of definite sleeps and timeout select actions.

@straight-shoota
Copy link
Member

I'm not sure what's the status of this. Do we want to merge this change or leave it as is due to the upcoming changes with execution contexts?

@ysbaddaden
Copy link
Contributor

Today: removing FiberChannel with the current schedulers means we assume a timer will always be enqueued. This happens to be true thanks to the stack-pool-collector fibers that we happen to create on each thread (and thus on each event loop) and repeat a timer.

While the assumption is true, it's a bit fragile 🫤

If we drop the stack-pool-collector fiber, then the FiberChannel becomes mandatory on UNIX again.

As stated by @HertzDevil: IOCP is broken because it only keeps track of timers and won't try to check for completion otherwise. It's only working thanks to the stack-pool-collector fiber.

Near future: the ExecutionContext schedulers don't need FiberChannel but they need the evloop to properly report whether it has pending operations, otherwise the schedulers assume the evloop is empty and will park the thread(s). Oops: infinite hang.

Proposal: Quinton's idea of the IOCP evloop assuming a +infinite timeout is a good idea, it would avoid having to keep track of operations. The Polling evloops already do that when blocking is true, and we could instruct libevent to do just the same (EVLOOP_NO_EXIT_ON_EMPTY) so all evloops would exhibit the same behavior 🤔

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 25, 2024

As for this PR I'm not sure. I'd keep FiberChannel if :unix and :preview_mt, and on Windows I would call EventLoop#interrupt on win32 (always).

BTW: PostQueuedCompletionStatus is probably a more natural solution for interrupting GetQueuedCompletionStatusEx than sending an UserAPC to the thread (no need to track the thread, ...).

@ysbaddaden
Copy link
Contributor

Alternative: tell libevent to not exit even where there are no events.

That could become a requirement of evloop implementations. The polling eventloop already blocks forever, and so does the refactored IOCP evloop (#15238)

We wouldn't need FiberChannel anymore and could just call Crystal::EventLoop#interrupt when the scheduler is sleeping (once). Since we must grab the lock anyway, it's simpler to just push the fiber to the runnables Deque than passing the pointer through a pipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:refactor topic:multithreading topic:stdlib:concurrency
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

spec/std/channel_spec.cr gets stuck on Windows with -Dpreview_mt
3 participants