Fiber context & Stack pool #6990

ysbaddaden · 2018-10-25T16:32:43Z

Here are my last refactors to Fiber (before MT experiments):

Extract a Fiber::StackPool to hold & collect stacks that can be recycled, and made it thread safe;
Create a Fiber::Context struct that is passed to the arch-specific ASM;
Add a resumable property to the fiber context that tells whether the fiber is resumable (it's context is saved), running, or dead;
Since a fiber can be enqueued before its context is saved —which should never happen with a single thread, unless a concurrency primitive is wrong— the scheduler waits for the fiber to be resumable, and raises if the fiber is dead.

Note: these changes shall impact the performance but hopefully not that much. With a single thread, mutexes should never be locked, and most checks will always succeed, unless something's wrong.

The fiber context holds the current stack top pointer (for a saved stack), as well as a resumable flag that tells whether the fiber can be resumed (its context was fully saved) or is currently running, or is dead (its proc returned). The resumable flag is required to prevent a fiber to be resumed while it's still running. This should usually not happen with the monothreaded scheduler, unless it's trying to resume a terminated fiber. This will be very frequent with a multithreaded scheduler where a thread B could try to resume a fiber that has just been enqueued by thread A but didn't fully stored its context yet. The resumable flag is also used to mark a fiber as dead, which means that it can't be resumed anymore. In that case the scheduler should raise an exception.

Keeps the pool of free stacks to recycle out of Fiber itself, and makes it thread-safe using simple thread mutexes. A better algorithm could eventually be used (e.g. nonblocking or flat-combining) to speed up spawning fibers in parallel.

src/fiber/context.cr

Heaven31415 · 2018-10-25T18:10:02Z

@ysbaddaden I got a question, it's a bit unrelated, but very important to consider for MT. Some libraries that are crucial for game and application development (like OpenGL) store thread local information which is used during their execution, thus forcing you to know on which thread you are currently running.

Will there be a mechanism to lock execution of exactly one fiber to a specific thread (which will allow proper usage of those libraries)?

This problem already happened in golang. You can read about their solution here and here.

vladfaust · 2018-10-25T20:16:57Z

I'm a simple man, I see thread safe, you got my thumb up

waj · 2018-10-26T01:54:17Z

Do you know how big is the impact within the same thread? Also, when is that spin wait actually meaningful? Why a running (and still runnable) fiber is being rescheduled into another thread instead of keep it running on the same thread?

I’m probably missing details on the big picture that you have in mind for the MT scheduler and I’d like to understand more about it.

ysbaddaden · 2018-10-26T11:42:38Z

I'll make more testing and can replace the spin wait with a simpler resumable / dead check, that prove an erroneous synchronization primitive such as a wrong assumption in channel, mutex or a manual resume somewhere. It's better to detect this than crash :)

Resumable and the spin lock in scheduler are because of MT with stealing schedulers where thread A enqueues the current fiber and thread B steals it and resumes it... before thread A saved it's context. Oops. Segfault.

ysbaddaden · 2018-10-26T14:52:52Z

Even with a dispatch scheduler, that would push enqueued fiber to any thread, since there is a time window between the moment we enqueue the current fiber and its context is saved, there is a risk that another thread tries to resume the fiber before it's context is fully saved.

waj · 2018-10-26T14:57:43Z

I see, but I wonder, if the current fiber is about to be resumed in another thread, then it's runnable (not waiting for IO, timer or channel data), so why reschedule it in a different thread in the first place instead of keep it running on the current one? Mm.. maybe there is a Fiber.yield call. Is there any other case?

ysbaddaden · 2018-10-26T18:25:53Z

Yes, furious yields with more schedulers than availablr fibers is what led me to crashes, until I understood the race condition, and fixed it with a resumable flag.

But still, when creating an IO event in thread A, maybe the IO is immediately ready and the event can be resumed or enqueued quickly from thread B, but thread A got suspended for some reason, and it didn't have enough time to swap.

The probability of such an event is very low and should never happen... but it may happen sporadicaly, and lead to weird crashes. Working with threads all summer made me aware that if a race can happen, even with the slightest chance, it will happen :D

Anyway, I'll run some benchmarks when I can to measure the impact on context switches. I still had incredible performance in muco, so I'm not too worried. Yet, if it's noticeably slower, I'll put this patch as the first of an MT branch!

bcardiff · 2018-10-26T18:31:13Z

We could use a compile time flag to apply the proposed lock only when MT is enabled. This will still leave the fiber context refactor and the assembly code setting it, but that should really matter performance wise.

ysbaddaden · 2018-10-28T20:32:51Z

I'll use this branch as a basis for MT schedulers.

TBH this resumable flag and the spin lock checks have been insignificant in muco. They always succeed on single thread, and provide incredible safety otherwise.

ysbaddaden added 3 commits October 25, 2018 18:27

Extract Fiber::StackPool

98dceff

Keeps the pool of free stacks to recycle out of Fiber itself, and makes it thread-safe using simple thread mutexes. A better algorithm could eventually be used (e.g. nonblocking or flat-combining) to speed up spawning fibers in parallel.

Fix: missing mkstemps binding for AArch64 target

4ce86fa

j8r reviewed Oct 25, 2018

View reviewed changes

src/fiber/context.cr Show resolved Hide resolved

ysbaddaden closed this Oct 28, 2018

ysbaddaden mentioned this pull request Dec 24, 2018

MT schedulers #7214

Closed

7 tasks

ysbaddaden mentioned this pull request Jan 31, 2019

Scheduler: resume only alive and non-current fibers #7348

Closed

ysbaddaden mentioned this pull request Feb 10, 2019

Fiber context switch, Stack pools, Safe select #7407

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fiber context & Stack pool #6990

Fiber context & Stack pool #6990

ysbaddaden commented Oct 25, 2018

Heaven31415 commented Oct 25, 2018

vladfaust commented Oct 25, 2018

waj commented Oct 26, 2018 •

edited

Loading

ysbaddaden commented Oct 26, 2018

ysbaddaden commented Oct 26, 2018

waj commented Oct 26, 2018

ysbaddaden commented Oct 26, 2018

bcardiff commented Oct 26, 2018

ysbaddaden commented Oct 28, 2018

Fiber context & Stack pool #6990

Fiber context & Stack pool #6990

Conversation

ysbaddaden commented Oct 25, 2018

Heaven31415 commented Oct 25, 2018

vladfaust commented Oct 25, 2018

waj commented Oct 26, 2018 • edited Loading

ysbaddaden commented Oct 26, 2018

ysbaddaden commented Oct 26, 2018

waj commented Oct 26, 2018

ysbaddaden commented Oct 26, 2018

bcardiff commented Oct 26, 2018

ysbaddaden commented Oct 28, 2018

waj commented Oct 26, 2018 •

edited

Loading