draft of multithreading blog post #408

JeffBezanson · 2019-07-14T21:18:14Z

Drafty and a bit incomplete in places, but I felt I had enough that we should start the editing and feedback process. I'm particularly interested to know whether this leaves you with any major unanswered questions about what's going on.

ChrisRackauckas · 2019-07-14T21:21:17Z

blog/_posts/2019-07-14-multithreading.md

+From the very beginning --- prior even to the 0.1 release --- Julia has had the `Task`
+type, providing symmetric coroutines and event-based I/O.
+So we have always had a unit of *concurrency* in the language, it just wasn't *parallel*
+(simultaneous streams of execution) yet.


Does this change to "real parallelism" here have a practical effect in some computation-based IO stuff, like managing a GPU and then doing CPU operations? Or managing multiple GPUs?

ChrisRackauckas · 2019-07-14T21:23:10Z

blog/_posts/2019-07-14-multithreading.md

+
+```
+$ JULIA_NUM_THREADS=4 ./julia
+```


Note that IDEs like Juno automatically detect the number of cores in a user's processor, giving Julia multithreading out of the box when used in these systems.

(I think it is important to mention this for less technical users)

Even if you automatically detect cores, you still need to be able to change the number - to give some cores to BLAS, or the OS or something else. So, ideally, Juno etc. would present the cores available, but have a way for you to change it in the IDE.

Would be cool to drop a screenshot (or that screenshot) here

ChrisRackauckas

Great work! I just listed the questions I had when reading it. Hopefully that helps.

ChrisRackauckas · 2019-07-14T21:33:54Z

blog/_posts/2019-07-14-multithreading.md

+As we often do, we tried to pick a method that would maximize throughput
+and reliability.
+We have a shared pool of stacks allocated by `mmap` (`VirtualAlloc` on
+windows), defaulting to 4MiB each (2MiB on 32-bit systems).


Is this changeable?

ViralBShah · 2019-07-14T21:51:35Z

blog/_posts/2019-07-14-multithreading.md

+
+## Acknowledgements
+
+We would like to gratefully acknowledge funding support from Intel and relational.ai


Suggested change

We would like to gratefully acknowledge funding support from Intel and relational.ai

We would like to gratefully acknowledge funding support from Intel and relationalAI

Are they ok with capitalizing it as "RelationalAI"?

ViralBShah · 2019-07-14T21:59:26Z

blog/_posts/2019-07-14-multithreading.md

+Today we are happy to announce a major new chapter in that story.
+We are releasing an entirely new threading interface for Julia programs:
+fully general task parallelism, inspired by parallel programming systems
+like [Cilk][] and [Go][].


We should note in which version of Julia (commit or nightly) the new interface is available. Otherwise people may expect it in the latest release version.

That's easy --- it's not available :)

ViralBShah · 2019-07-14T22:01:49Z

blog/_posts/2019-07-14-multithreading.md

+I think of it as somewhat analogous to garbage collection: with GC, you
+can freely allocate objects without worrying about how it works or when and how they
+are freed.
+With task parallelism, you freely spawn tasks without worrying about where they run.


It would be cool if we could say that a very large number of tasks (is it tens of thousands, or millions?) can be spawned without worry. Some users coming from HPC may have a pthreads view of the world, and this can help make it clear.

blog/_posts/2019-07-14-multithreading.md

+    return fib(n - 1) + fetch(t)
+end
+```
+


StefanKarpinski · 2019-07-15T19:23:26Z

Great blog post so far. This is such exciting stuff. Here are some comments.

Might be worth splitting into two blog posts:

high level overview, usage examples, scaling experiments
details, internals, design decisions (RNG, I/O, etc.)

There’s more impact of taking a sequential code and showing the diff required to make it parallel is tiny. The more direct comparison should also show more scaling. Maybe modify the Base mergesort to be parallel instead? Or compare a simple sequential merge sort with a parallel one? Can additionally compare with the optimized built-in sort and show that it’s a bit faster but only a bit and the parallel one is still faster.

A little coda to the section where you pass temps through the psort! implementation would be good, just summarizing what was done and remarking on how it was pretty simple and maybe showing the improved performance.

The section on rand() seems a bit out of place. Maybe have a section on design decisions that includes that and some of the I/O stuff?

“integers are, fortunately, free” is a bit confusing—why is allocating that much virtual memory unconcerning? A lot of people won’t undertand this.

“In practice, we have an alternate implementation of stack switching”: doesn’t indicate when, if ever, this is used. Maybe add a sentence about how to switch this (compile flag) and that we’ll continue to explore the design space for task stacks to get the best of all worlds as much as possible.

“This is a tricky synchronization problem, since some threads might be scheduling new work while other threads are deciding to block.” Is this meant to be “deciding to sleep”?

“My hands-down favorite”—there are multiple people on the by line, so using first person singular here is confusing.

JeffBezanson · 2019-07-15T22:00:11Z

“My hands-down favorite”—there are multiple people on the by line, so using first person singular here is confusing.

Should I just switch to "we" everywhere?

StefanKarpinski · 2019-07-15T22:28:36Z

Given the multiple authors I think that’s the way to go.

vtjnash · 2019-07-16T21:23:04Z

blog/_posts/2019-07-14-multithreading.md

+```
+
+This, of course, is the classic highly-inefficient tree recursive implementation of
+the Fibonacci sequence --- but running on any number of processor cores!


Suggested change

the Fibonacci sequence --- but running on any number of processor cores!

the Fibonacci sequence--but running on any number of processor cores!

I think markdown will interpret this sequence (or use — directly)

blog/_posts/2019-07-14-multithreading.md

vtjnash · 2019-07-20T13:32:29Z

blog/_posts/2019-07-14-multithreading.md

+Software performance depends more and more on exploiting multiple processor cores.
+The [free lunch][] is still over.
+Well, we here in the Julia developer community have something of a reputation for
+caring about performance, so we've known for years that we would need a good


Suggested change

caring about performance, so we've known for years that we would need a good

caring about easy performance. We've already built a strong story around multi-process, distributed programming and GPUs. But we've also known that we needed fast and composable multi-threading.

EDIT: compostable->composeable

Although I sort of like the REUSE symbol for this:

vtjnash · 2019-07-20T13:33:43Z

blog/_posts/2019-07-14-multithreading.md

+The [free lunch][] is still over.
+Well, we here in the Julia developer community have something of a reputation for
+caring about performance, so we've known for years that we would need a good
+story for multi-threaded, multi-core execution.


Suggested change

story for multi-threaded, multi-core execution.

blog/_posts/2019-07-14-multithreading.md

vtjnash · 2019-07-20T15:38:08Z

blog/_posts/2019-07-14-multithreading.md

+i = 6 on thread 2
+```
+
+Without further ado, let's try some nested parallelism.


Suggested change

Without further ado, let's try some nested parallelism.

A big differentiator of our Julia Task-based parallelism system is the automatic handling of nested parallism. Each Task can act like a first-class future, just running simultaneously to utilize all CPU cores efficiently. So without further ado, let's try some nested parallelism.

blog/_posts/2019-07-14-multithreading.md

vtjnash · 2019-07-20T15:40:01Z

blog/_posts/2019-07-14-multithreading.md

+for parallelism.
+Here is the code:
+
+```


Do we set Julia as the default syntax highlighter, or do we need to annotate the code blocks

vtjnash · 2019-07-20T16:11:16Z

blog/_posts/2019-07-14-multithreading.md

+    half = @par psort!(v, lo, mid)    # task to sort the lower half; will run
+    psort!(v, mid+1, hi)              # in parallel with the current call sorting
+                                      # the upper half
+    wait(half)                        # wait for the lower half to finish


I think it'd be fun to use fetch here (implementing sort instead of sort!), as that seems harder to me (relative to what other languages provide and do), and we're already making a copy below

julia> function psort(v, lo::Int=1, hi::Int=length(v)) if lo > hi # 1 or 0 elements; nothing to do return similar(v, 0) elseif lo == hi out = similar(v, 1) out[1] = v[lo] return out end if hi - lo < 100000 # below some cutoff, run in serial return sort(view(v, lo:hi), alg = MergeSort) end mid = (lo+hi)>>>1 # find the midpoint half = @task psort(v, lo, mid) # task to sort the lower half; will run half.sticky = false schedule(half) right = psort(v, mid+1, hi) # in parallel with the current call sorting # the upper half left = fetch(half) # wait for the lower half to finish out = similar(v, hi-lo+1) # result @assert length(right) + length(left) == length(out) i, il, ir = 1, 1, 1 # merge the two sorted sub-arrays @inbounds while il <= length(left) && ir <= length(right) l, r = left[il], right[ir] if l < r out[i] = l il += 1 else out[i] = r ir += 1 end i += 1 end @inbounds while il <= length(left) out[i] = left[il] il += 1 i += 1 end @inbounds while ir <= length(right) out[i] = right[ir] ir += 1 i += 1 end return out end

julia> using Random; Random.seed!(0); a = rand(20000000); julia> @time sort(a); 1.469319 seconds (6 allocations: 152.588 MiB) julia> @time sort(a); 1.540864 seconds (6 allocations: 152.588 MiB, 2.91% gc time) julia> @time psort(a); # 1 thead 20.526943 seconds (879.42 M allocations: 14.520 GiB, 9.55% gc time) julia> @time psort(a); # 2 threads 12.870170 seconds (879.42 M allocations: 14.520 GiB, 15.73% gc time) julia> @time psort(a); 10.782067 seconds (879.42 M allocations: 14.520 GiB, 18.62% gc time) julia> @time psort(a); # 4 threads 9.499449 seconds (879.42 M allocations: 14.520 GiB, 22.32% gc time)

┌────────────────────────────────────────┐ 30 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠉⠒⠤⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠈⠑⠢⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠉⠒⢄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠑⠦⠤⢄⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠒⠒⠒⠤⠤⠤⠤⢄⣀⣀⣀⣀⣀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠉⠉│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ └────────────────────────────────────────┘ 1 4

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

blog/_posts/2019-07-14-multithreading.md

Co-Authored-By: Stefan Karpinski <stefan@karpinski.org>

blog/_posts/2019-07-14-multithreading.md

Co-Authored-By: Kristoffer Carlsson <kcarlsson89@gmail.com>

timholy

Exciting times ahead! Congrats everyone.

blog/_posts/2019-07-14-multithreading.md

timholy · 2019-07-20T20:45:06Z

blog/_posts/2019-07-14-multithreading.md

+Let's try a different machine with more CPU cores:
+
+```
+$ for n in 1 2 4 8 16; do    JULIA_NUM_THREADS=$n ./julia psort.jl; done


Briefly summarize what psort.jl does. E.g., does this include compile time? I'm noting the times are longer than above.

Co-Authored-By: Tim Holy <tim.holy@gmail.com>

blog/_posts/2019-07-14-multithreading.md

vtjnash · 2019-07-20T21:21:23Z

blog/_posts/2019-07-14-multithreading.md

+  1.222777 seconds (3.78 k allocations: 686.935 MiB, 9.14% gc time)
+  0.958517 seconds (3.79 k allocations: 686.935 MiB, 18.21% gc time)
+  0.836891 seconds (3.78 k allocations: 686.935 MiB, 21.10% gc time)
+```


can we graph this and/or show the normalized values?

Co-Authored-By: Tim Holy <tim.holy@gmail.com> Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

vtjnash · 2019-07-20T21:26:16Z

blog/_posts/2019-07-14-multithreading.md

+
+```
+lock(cond::Threads.Condition)
+while !ready


Suggested change

while !ready

lock(cond::Threads.Condition)

try

while !ready

wait(cond)

end

finally

unlock(cond)

end

vtjnash · 2019-07-20T21:34:45Z

blog/_posts/2019-07-14-multithreading.md

+As in previous versions, the standard lock to use to protect critical sections
+is `ReentrantLock`, which is now thread-safe (it was previously only used for
+synchronizing tasks).
+`Threads.SpinLock` is also available, to be used in rare circumstances where


Suggested change

`Threads.SpinLock` is also available, to be used in rare circumstances where

There are some other types of locks defined internally for specific circumstances which usually should not be applicable to the typical user (these include `Threads.SpinLock`, `Threads.Mutex`, and a variety of libuv-based mutexes protecting various parts of the runtime). These are used in rare circumstances where (1) only threads and not tasks will be synchronized, and (2) you know the the lock will only be held for a short time.

vtjnash · 2019-07-20T21:35:21Z

blog/_posts/2019-07-14-multithreading.md

+`Threads.SpinLock` is also available, to be used in rare circumstances where
+(1) only threads and not tasks need to be synchronized, and (2) you expect to
+hold the lock for a short time.
+`Semaphore` and `Event` are also available, completing the standard set of


not really complete? there'd also be barrier, rwlocks, and a "once" I think in a "standard set"

Suggested change

`Semaphore` and `Event` are also available, completing the standard set of

The `Threads` module also provides `Semaphore` and `Event` types with their standard definition.

vtjnash · 2019-07-20T21:35:35Z

blog/_posts/2019-07-14-multithreading.md

+is `ReentrantLock`, which is now thread-safe (it was previously only used for
+synchronizing tasks).
+`Threads.SpinLock` is also available, to be used in rare circumstances where
+(1) only threads and not tasks need to be synchronized, and (2) you expect to


Suggested change

(1) only threads and not tasks need to be synchronized, and (2) you expect to

vtjnash · 2019-07-20T21:35:40Z

blog/_posts/2019-07-14-multithreading.md

+(1) only threads and not tasks need to be synchronized, and (2) you expect to
+hold the lock for a short time.
+`Semaphore` and `Event` are also available, completing the standard set of
+synchronization primitives.


Suggested change

synchronization primitives.

vtjnash · 2019-07-20T21:39:26Z

blog/_posts/2019-07-14-multithreading.md

+argument value to allocate space automatically when the caller doesn't provide it:
+
+```
+function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])


Suggested change

function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])

function psort!(v, lo::Int=1, hi::Int=length(v), temps=[similar(v, cld(length(v), 2)) for i = 1:Threads.nthreads()])

Suggested change

function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])

function psort!(v, lo::Int=1, hi::Int=length(v), temps=[similar(v, (length(v) + 1) ÷ 2) for i = 1:Threads.nthreads()])

Suggested change

function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])

function psort!(v, lo::Int=1, hi::Int=length(v), temps=[similar(v, (hi - lo + 1) ÷ 2) for i = 1:Threads.nthreads()])

Suggested change

function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])

function psort!(v, lo::Int=1, hi::Int=length(v), temps=[similar(v, 0) for i = 1:Threads.nthreads()]) # and add from Base `(length(t) < m-lo+1) && resize!(t, m-lo+1)`

vtjnash · 2019-07-20T21:39:41Z

blog/_posts/2019-07-14-multithreading.md

+But for high-performance code we recommend thread-local state.
+Our `psort!` routine above can be improved in this way.
+Here is a recipe.
+First, we modify the function to accept pre-allocated buffers, using a default


Suggested change

First, we modify the function to accept pre-allocated buffers, using a default

First, we modify the function signature to accept pre-allocated buffers, using a default

vtjnash · 2019-07-20T22:03:12Z

blog/_posts/2019-07-14-multithreading.md

+Definitely faster, but we do seem to have some work to do on the
+scalability of the runtime system.
+
+### Seeding the default random number generator


Move this after ### IO?

I put it here since I consider this something you might need to know to update code, while the IO section is more internal details.

Ah, that sounds like a good reason.

Suggested change

### Seeding the default random number generator

### Random number generation

The approach we've taken with Julia's default global random number generator (`rand()` and friends) is to make it thread-specific. On first use, each thread will create an independent instance of the default RNG type (currently Mersenne-Twister) seeded from current system entropy. All operations that affect the random number state then (`rand`, `srand`, `randn`, etc.) now operate on only the current thread's RNG state. This way, multiple independent code sequences that seed and then use random numbers will individually work as expected. If you need all threads to be using a known initial seed, you will need to do that explicitly on each worker thread being used at the start of the algorithm work.

For more precise control, better performance, or other elaborate requirements, we recommend allocating and passing your own RNG objects (e.g. `Rand.MersenneTwister()`).

StefanKarpinski · 2019-07-21T01:58:54Z

blog/_posts/2019-07-14-multithreading.md

+Here are some of the points we hope to focus on to further develop
+our threading capabilities:
+
+* Performance work on task switch and I/O latency.


Suggested change

* Performance work on task switch and I/O latency.

We would like to gratefully acknowledge funding support from [Intel][] and [relationalAI][]

Did you mean to put this in this section and delete this content?

StefanKarpinski · 2019-07-21T01:59:58Z

blog/_posts/2019-07-14-multithreading.md

+
+
+## Acknowledgements
+


Suggested change

[here]: https://github.com/JuliaLang/julia/pull/31086

[Intel]: https://www.intel.com/

[relationalAI]: http://relational.ai/

vtjnash · 2019-07-21T14:29:29Z

blog/_posts/2019-07-14-multithreading.md

+An "official" version will appear in a later release, to give us time to settle
+on an API we can commit to for the long term.
+Here's what you need to know if you want to upgrade your code over this period.
+


Suggested change

- Managing [Task scheduling and synchronization](#Task-scheduling-and-synchronization)

- Managing [Thread-local state](#Thread-local-state)

- Effect on [Random number generation](#Random-Number-Generation)

vtjnash · 2019-07-21T16:53:55Z

blog/_posts/2019-07-14-multithreading.md

+Definitely faster, but we do seem to have some work to do on the
+scalability of the runtime system.
+
+### Seeding the default random number generator


Ah, that sounds like a good reason.

Suggested change

### Seeding the default random number generator

### Random number generation

The approach we've taken with Julia's default global random number generator (`rand()` and friends) is to make it thread-specific. On first use, each thread will create an independent instance of the default RNG type (currently Mersenne-Twister) seeded from current system entropy. All operations that affect the random number state then (`rand`, `srand`, `randn`, etc.) now operate on only the current thread's RNG state. This way, multiple independent code sequences that seed and then use random numbers will individually work as expected. If you need all threads to be using a known initial seed, you will need to do that explicitly on each worker thread being used at the start of the algorithm work.

For more precise control, better performance, or other elaborate requirements, we recommend allocating and passing your own RNG objects (e.g. `Rand.MersenneTwister()`).

vtjnash · 2019-07-21T19:26:37Z

blog/_posts/2019-07-14-multithreading.md

+As we often do, we tried to pick a method that would maximize throughput
+and reliability.
+We have a shared pool of stacks allocated by `mmap` (`VirtualAlloc` on
+windows), defaulting to 4MiB each (2MiB on 32-bit systems).


vtjnash · 2019-07-21T19:39:54Z

blog/_posts/2019-07-14-multithreading.md

+Using it is not recommended, since it is hard to predict how much stack
+space will be needed, for instance by the compiler or called libraries.
+
+A thread can switch to running a given task simply (in principle) by switching


that’s not really true on any platform

Suggested change

A thread can switch to running a given task simply (in principle) by switching

A thread can switch to running a given task by adjusting the registers to appear to “return from” the previous task switch. We allocate a new stack out of a local pool just before we start running it.

vtjnash · 2019-07-21T19:44:22Z

blog/_posts/2019-07-14-multithreading.md

+
+We also have an alternate implementation of stack switching (controlled by the
+`ALWAYS_COPY_STACKS` variable in `options.h`) that trades time for memory by
+copying live stack data when a task switch occurs.


Suggested change

copying live stack data when a task switch occurs.

copying live stack data when a task switch occurs.

This may not be compatible with foreign code that uses `cfunction`,

so it is not the default.

vtjnash · 2019-07-21T22:26:19Z

blog/_posts/2019-07-14-multithreading.md

+scheduler.
+In particular, we need to make sure no other thread sees that task and thinks
+"oh, there's a task I can run", causing it to scribble on the scheduler's
+stack.


Suggested change

stack.

vtjnash · 2019-07-21T22:29:54Z

blog/_posts/2019-07-14-multithreading.md

+Here are some of the points we hope to focus on to further develop
+our threading capabilities:
+
+* Performance work on task switch and I/O latency.


Did you mean to put this in this section and delete this content?

vtjnash · 2019-07-21T22:31:51Z

blog/_posts/2019-07-14-multithreading.md

+* More performant parallel loops and reductions, with more scheduling options.
+* Allow adding more threads at run time.
+* Improved debugging tools.
+* Explore API extensions, e.g. cancel points.


I hope never :P

vtjnash · 2019-07-21T22:32:43Z

blog/_posts/2019-07-14-multithreading.md

+* Allow adding more threads at run time.
+* Improved debugging tools.
+* Explore API extensions, e.g. cancel points.
+* Thread-safe data structures.


Suggested change

* Thread-safe data structures.

* Standard-library of thread-safe data structures for user code.

vtjnash · 2019-07-21T23:10:06Z

blog/_posts/2019-07-14-multithreading.md

+
+We are also grateful to the several people who patiently tried this functionality
+while it was in development and filed bug reports or pull requests, and spurred us
+to keep going!


Suggested change

to keep going!

to keep going! We know there's remaining problems and so will appreciate you letting us know your experience with it through GitHub and Discourse channels!

Co-Authored-By: Kristoffer Carlsson <kcarlsson89@gmail.com>

…to jb/mtblog

tknopp · 2019-07-25T21:23:01Z

@JeffBezanson: Awesome work! Regarding the blog post I missed the origin of the threading effort, which was this PR JuliaLang/julia#6741

I don't want to overrate my work but the prototype was pretty functional and I had the impression that it was an important step towards serious multi-threading. There is also a publication outlining the concepts behind that effort: https://ieeexplore.ieee.org/document/7069898

draft of multithreading blog post

a7cbc7e

ChrisRackauckas reviewed Jul 14, 2019

View reviewed changes

ViralBShah reviewed Jul 14, 2019

View reviewed changes

blog/_posts/2019-07-14-multithreading.md Outdated

return fib(n - 1) + fetch(t)

end

```

This comment was marked as resolved.

Sign in to view

some edits

a59d9e5

some edits, part 2

1f63a12

vtjnash reviewed Jul 20, 2019

View reviewed changes

JeffBezanson and others added 8 commits July 20, 2019 12:52

Update blog/_posts/2019-07-14-multithreading.md

bf7ae0b

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

d84b7fc

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

ae16cf8

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

ea85488

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

af04294

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

a1ebca1

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

a19258d

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

Update blog/_posts/2019-07-14-multithreading.md

c6f2163

Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

StefanKarpinski reviewed Jul 20, 2019

View reviewed changes

blog/_posts/2019-07-14-multithreading.md Outdated Show resolved Hide resolved

StefanKarpinski reviewed Jul 20, 2019

View reviewed changes

blog/_posts/2019-07-14-multithreading.md Outdated Show resolved Hide resolved

Apply suggestions from code review

920e8f3

Co-Authored-By: Stefan Karpinski <stefan@karpinski.org>

KristofferC reviewed Jul 20, 2019

View reviewed changes

blog/_posts/2019-07-14-multithreading.md Outdated Show resolved Hide resolved

JeffBezanson and others added 2 commits July 20, 2019 15:46

even more edits

c3f0958

Update blog/_posts/2019-07-14-multithreading.md

78d0840

Co-Authored-By: Kristoffer Carlsson <kcarlsson89@gmail.com>

timholy reviewed Jul 20, 2019

View reviewed changes

Update blog/_posts/2019-07-14-multithreading.md

8862f0e

Co-Authored-By: Tim Holy <tim.holy@gmail.com>

KristofferC reviewed Jul 20, 2019

View reviewed changes

blog/_posts/2019-07-14-multithreading.md Outdated Show resolved Hide resolved

vtjnash reviewed Jul 20, 2019

View reviewed changes

Apply suggestions from code review

1250176

Co-Authored-By: Tim Holy <tim.holy@gmail.com> Co-Authored-By: Jameson Nash <vtjnash@gmail.com>

vtjnash reviewed Jul 20, 2019

View reviewed changes

StefanKarpinski reviewed Jul 21, 2019

View reviewed changes

vtjnash reviewed Jul 21, 2019

View reviewed changes

ViralBShah and others added 6 commits July 22, 2019 11:22

Update blog/_posts/2019-07-14-multithreading.md

e8e4724

Co-Authored-By: Kristoffer Carlsson <kcarlsson89@gmail.com>

more edits

e8554e5

Merge branch 'jb/mtblog' of github.com:JuliaLang/www.julialang.org in…

e3651eb

…to jb/mtblog

more edits

315b149

more edits

c7642e1

add downloads link

86b71c9

JeffBezanson merged commit 14be227 into master Jul 23, 2019

delete-merged-branch bot deleted the jb/mtblog branch July 23, 2019 15:26


		## Acknowledgements

		We would like to gratefully acknowledge funding support from Intel and relational.ai

	the Fibonacci sequence --- but running on any number of processor cores!
	the Fibonacci sequence--but running on any number of processor cores!

	caring about performance, so we've known for years that we would need a good
	caring about easy performance. We've already built a strong story around multi-process, distributed programming and GPUs. But we've also known that we needed fast and composable multi-threading.

	Without further ado, let's try some nested parallelism.
	A big differentiator of our Julia Task-based parallelism system is the automatic handling of nested parallism. Each Task can act like a first-class future, just running simultaneously to utilize all CPU cores efficiently. So without further ado, let's try some nested parallelism.

	`Semaphore` and `Event` are also available, completing the standard set of
	The `Threads` module also provides `Semaphore` and `Event` types with their standard definition.

	function psort!(v, lo::Int=1, hi::Int=length(v), temps = [similar(v,cld(length(v),2)) for i = 1:Threads.nthreads()])
	function psort!(v, lo::Int=1, hi::Int=length(v), temps=[similar(v, cld(length(v), 2)) for i = 1:Threads.nthreads()])

	First, we modify the function to accept pre-allocated buffers, using a default
	First, we modify the function signature to accept pre-allocated buffers, using a default

-### Seeding the default random number generator
+### Random number generation
+The approach we've taken with Julia's default global random number generator (`rand()` and friends) is to make it thread-specific. On first use, each thread will create an independent instance of the default RNG type (currently Mersenne-Twister) seeded from current system entropy. All operations that affect the random number state then (`rand`, `srand`, `randn`, etc.) now operate on only the current thread's RNG state. This way, multiple independent code sequences that seed and then use random numbers will individually work as expected. If you need all threads to be using a known initial seed, you will need to do that explicitly on each worker thread being used at the start of the algorithm work.
+For more precise control, better performance, or other elaborate requirements, we recommend allocating and passing your own RNG objects (e.g. `Rand.MersenneTwister()`).

	* Performance work on task switch and I/O latency.
	We would like to gratefully acknowledge funding support from [Intel][] and [relationalAI][]

+[here]: https://github.com/JuliaLang/julia/pull/31086
+[Intel]: https://www.intel.com/
+[relationalAI]: http://relational.ai/

+  - Managing [Task scheduling and synchronization](#Task-scheduling-and-synchronization)
+  - Managing [Thread-local state](#Thread-local-state)
+  - Effect on [Random number generation](#Random-Number-Generation)

	A thread can switch to running a given task simply (in principle) by switching
	A thread can switch to running a given task by adjusting the registers to appear to “return from” the previous task switch. We allocate a new stack out of a local pool just before we start running it.

	* Thread-safe data structures.
	* Standard-library of thread-safe data structures for user code.

	to keep going!
	to keep going! We know there's remaining problems and so will appreciate you letting us know your experience with it through GitHub and Discourse channels!

draft of multithreading blog post #408

draft of multithreading blog post #408

Conversation

JeffBezanson commented Jul 14, 2019

ChrisRackauckas Jul 14, 2019 • edited Loading

Choose a reason for hiding this comment

ChrisRackauckas Jul 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChrisRackauckas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

StefanKarpinski commented Jul 15, 2019 • edited Loading

JeffBezanson commented Jul 15, 2019

StefanKarpinski commented Jul 15, 2019

Choose a reason for hiding this comment

vtjnash Jul 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

timholy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tknopp commented Jul 25, 2019

ChrisRackauckas Jul 14, 2019 •

edited

Loading

ChrisRackauckas Jul 14, 2019 •

edited

Loading

StefanKarpinski commented Jul 15, 2019 •

edited

Loading

vtjnash Jul 20, 2019 •

edited

Loading

vtjnash Jul 22, 2019 •

edited

Loading