-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fiber context & Stack pool #6990
Fiber context & Stack pool #6990
Conversation
The fiber context holds the current stack top pointer (for a saved stack), as well as a resumable flag that tells whether the fiber can be resumed (its context was fully saved) or is currently running, or is dead (its proc returned). The resumable flag is required to prevent a fiber to be resumed while it's still running. This should usually not happen with the monothreaded scheduler, unless it's trying to resume a terminated fiber. This will be very frequent with a multithreaded scheduler where a thread B could try to resume a fiber that has just been enqueued by thread A but didn't fully stored its context yet. The resumable flag is also used to mark a fiber as dead, which means that it can't be resumed anymore. In that case the scheduler should raise an exception.
Keeps the pool of free stacks to recycle out of Fiber itself, and makes it thread-safe using simple thread mutexes. A better algorithm could eventually be used (e.g. nonblocking or flat-combining) to speed up spawning fibers in parallel.
@ysbaddaden I got a question, it's a bit unrelated, but very important to consider for MT. Some libraries that are crucial for game and application development (like OpenGL) store thread local information which is used during their execution, thus forcing you to know on which thread you are currently running. Will there be a mechanism to lock execution of exactly one fiber to a specific thread (which will allow proper usage of those libraries)? This problem already happened in golang. You can read about their solution here and here. |
I'm a simple man, I see |
Do you know how big is the impact within the same thread? Also, when is that spin wait actually meaningful? Why a running (and still runnable) fiber is being rescheduled into another thread instead of keep it running on the same thread? I’m probably missing details on the big picture that you have in mind for the MT scheduler and I’d like to understand more about it. |
I'll make more testing and can replace the spin wait with a simpler resumable / dead check, that prove an erroneous synchronization primitive such as a wrong assumption in channel, mutex or a manual resume somewhere. It's better to detect this than crash :) Resumable and the spin lock in scheduler are because of MT with stealing schedulers where thread A enqueues the current fiber and thread B steals it and resumes it... before thread A saved it's context. Oops. Segfault. |
Even with a dispatch scheduler, that would push enqueued fiber to any thread, since there is a time window between the moment we enqueue the current fiber and its context is saved, there is a risk that another thread tries to resume the fiber before it's context is fully saved. |
I see, but I wonder, if the current fiber is about to be resumed in another thread, then it's runnable (not waiting for IO, timer or channel data), so why reschedule it in a different thread in the first place instead of keep it running on the current one? Mm.. maybe there is a |
Yes, furious yields with more schedulers than availablr fibers is what led me to crashes, until I understood the race condition, and fixed it with a resumable flag. But still, when creating an IO event in thread A, maybe the IO is immediately ready and the event can be resumed or enqueued quickly from thread B, but thread A got suspended for some reason, and it didn't have enough time to swap. The probability of such an event is very low and should never happen... but it may happen sporadicaly, and lead to weird crashes. Working with threads all summer made me aware that if a race can happen, even with the slightest chance, it will happen :D Anyway, I'll run some benchmarks when I can to measure the impact on context switches. I still had incredible performance in muco, so I'm not too worried. Yet, if it's noticeably slower, I'll put this patch as the first of an MT branch! |
We could use a compile time flag to apply the proposed lock only when MT is enabled. This will still leave the fiber context refactor and the assembly code setting it, but that should really matter performance wise. |
I'll use this branch as a basis for MT schedulers. TBH this resumable flag and the spin lock checks have been insignificant in muco. They always succeed on single thread, and provide incredible safety otherwise. |
Here are my last refactors to Fiber (before MT experiments):
resumable
property to the fiber context that tells whether the fiber is resumable (it's context is saved), running, or dead;Note: these changes shall impact the performance but hopefully not that much. With a single thread, mutexes should never be locked, and most checks will always succeed, unless something's wrong.