Improve parallel computing documentation #14525

ahwillia · 2015-12-31T23:44:42Z

I've been having a bit of a hard time teaching myself how to use Julia's parallel computing tools from the documentation. There are a couple open issues / pull requests to suggest I'm not alone:

So far, I've just rewritten the introductory bits of the parallel computing documentation. I am happy to iterate on this and write/re-write more sections if this is helpful.

I tried to address the following issues (from my perspective) in the introductory sections:

There wasn't a discussion of basic commands like nprocs, addprocs, etc.
Technical terms RemoteRef, Channel, RemoteChannel, etc. are introduced without being explained. Several are explained later in the documentation, so removing these early mentions cleans things up.
I added two section headers to orient the readers

ahwillia · 2016-01-01T01:24:45Z

Other potential improvements:

~~update require("count_heads") since require is deprecated~~ done
~~pmap() example could be more explicit. result = pmap(svd, M); U1,s1,Vt1 = result[1]~~ done
Any drawbacks to using SharedArrays? I don't understand some of the details in this section.
The put() command could be explained in more detail with an example.

ViralBShah · 2016-01-01T04:26:24Z

Cc @amitmurthy

tshort · 2016-01-01T13:06:04Z

These are helpful improvements. Thanks.

kshyatt · 2016-01-01T18:15:07Z

doc/manual/parallel-computing.rst


-Let's try this out. Starting with ``julia -p n`` provides ``n`` worker
+Starting with ``julia -p n`` provides ``n`` worker
 processes on the local machine. Generally it makes sense for ``n`` to
 equal the number of CPU cores on the machine.


Do we want to mention hyperthreading here? Like, what if I have 4 physical cores but 8 hyperthreaded ones? Do I want n = 4 or n = 8?

+1, not sure I know the right answer though (or if it depends on details of the problem).

amitmurthy · 2016-01-05T06:03:34Z

doc/manual/parallel-computing.rst

+    julia> isnull(r.v) # data has been fetched
+    false
+
+    julia> get(r.v); # if we want to re-access the result


This comment # if we want to re-access the result should be removed. fetch on a Future anyways checks and returns the cached value. That fact ought to be mentioned. Folks should only use fetch and not access the fields of a reference.

amitmurthy · 2016-01-05T06:07:28Z

Nice improvements. Thanks!

ahwillia · 2016-01-05T08:04:17Z

Thanks for the comments @amitmurthy -- I think I've addressed them with the last three commits. I can squash all of these before merging.

I think it is getting there, but I'd like to add a short, self-contained example where a vector of data is distributed across a few processes, and then have each process compute some function on the locally stored data. I'm motivated here by some applications in distributed optimization... (which is why I'm interested in learning these parallel tools better)

What do you think? Is the best way to do this with a DistributedArray? Would this example therefore belong in the DistributedArrays package documentation instead?

amitmurthy · 2016-01-05T08:16:23Z

Yes, DistributedArrays is the right place. Though you can have an example here which uses SharedArrays and http://docs.julialang.org/en/latest/stdlib/parallel/#Base.localindexes to partition work among local workers.

ahwillia · 2016-01-05T08:19:41Z

One thing I'd like answered about SharedArrays in the documentation is where the data is actually stored... Does each worker get a copy? Is there just one array on the master process? Is it actually distributed across processes under the hood?

Sorry if this is answered somewhere, but I couldn't find this information readily.

amitmurthy · 2016-01-05T08:23:29Z

SharedArrays uses system shared memory to map the same physical segments onto all participating worker processes on the same host. Changing any part of the array by any worker results in other workers seeing the same immediately.

ahwillia · 2016-01-07T06:38:42Z

Thanks @amitmurthy. That makes complete sense and I now see that this is explained in the very first sentence of the Shared Arrays section.

Sorry for the other newbie question: is a vanilla Array more or less stored in the same physical location as a SharedArray? The only difference being that the address to the data isn't broadcast to multiple processes?

Here are some brief/small edits to try to make these points as explicit as possible.

Shared Arrays
-------------

A :class:`SharedArray` places a single instance of an array in system shared
memory so that it can be immediately accessed and altered by many or all
processes. In contrast, a standard :class:`Array` can only be accessed and
altered by process that created it. The `DistributedArrays`_ package provides a
third type of array where each process has local access to just a chunk of the data,
and no two processes share the same chunk.

A :class:`SharedArray` is a good choice when you want to have a large amount
of data jointly accessible to two or more processes on the same machine.

... Then maybe talk about situations where using a SharedArray is a bad choice?
Is a DArray better for really large arrays? I imagine they are really necessary
for very large problems distributed across multiple machines / large clusters.
Are there other situations?

ahwillia · 2016-01-07T06:59:09Z

Another question I forgot to ask: Are there easy ways to turn a Array into a SharedArray? Or equivalent functions like dzeros, drand, drandn, etc.?

amitmurthy · 2016-01-07T10:48:12Z

Unexported functions Base.shmem_fill and Base.shmem_fill exist as convenience constructors.

DArrays are useful when 1) the entire array does not fit into memory on a single node and 2) leveraging lots of cores in a cluster - again more than that available in a single node. Communication overhead usually dictates if a DArray is appropriate for a problem.

The need for Shared Arrays will reduce once we have multi-threading fully ironed out. Till then it is an efficient means of utilizing all cores on a host.

ahwillia · 2016-01-10T08:06:00Z

Does these changes look good to everyone? @kshyatt -- did you have an idea for a more developed/involved example for Channels?

kshyatt · 2016-01-10T08:07:37Z

doc/manual/parallel-computing.rst

+This section provided some basic commands to get you aquainted with setting up a
+parallel environment in julia. More advanced tools for launching and managing
+processes are covered in the sections on :ref:`man-clustermanagers` and
+:ref:`man-net-topology`.


This is a nice way of addressing the topology issue. 👍

ahwillia · 2016-01-14T18:09:53Z

I squashed all the commits down... Unfortunately there are conflicts that need to be resolved.

iamed2 · 2016-01-14T18:13:39Z

Speaking of SharedArrays documentation: #14664

tkelman · 2016-01-14T18:18:42Z

Ah, sorry the conflict is #14606, which it looks like your version also addresses.

ahwillia · 2016-01-14T18:23:50Z

@iamed2 -- I think this is a problem with SharedArrays, not the documentation? Maybe someone else could confirm?

amitmurthy · 2016-01-15T05:14:03Z

Yes, that is a bug.

ahwillia · 2016-01-20T23:44:12Z

Great seeing as that is a bug, we can proceed. @tkelman -- Can you help me resolve the merge conflict?

kmsquire · 2016-01-21T04:07:21Z

Assuming you're using command line git:

git fetch
git rebase origin/master

This will attempt to reapply your patch to the latest master, which will fail, but all of the changes that can be applied will be. For any others, you can edit the file, looking for sections delimited by <<<<<<<<<, ==========, and >>>>>>>>>, and editing these so that the final text is the text you want.

Then do

EDIT: First

git add doc/manual/parallel-computing.rst
git commit

end EDIT

git push -f <your_github> parallel_doc

And the changes will magically appear here.

kmsquire · 2016-01-21T04:08:39Z

(Sorry, before the last push, you need to git add and git commit (or just git commit -a) the file. Changed inline.)

ahwillia · 2016-01-23T00:31:11Z

Thanks @kmsquire ! Should be ready to merge now unless someone objects.

amitmurthy · 2016-01-27T08:16:59Z

doc/manual/parallel-computing.rst

@@ -451,36 +550,33 @@ preemptively. This means context switches only occur at well-defined
 points: in this case, when :func:`remotecall_fetch` is called.


-Channels


I think it is important retain this section and differentiate between Channels which are used for inter-task communication and RemoteChannels/Futures which are used for inter-process communication.

@amitmurthy - Can you spell out what is precisely meant by inter-task vs inter-process communication? I think I conflated them in what I wrote because the examples seem to emphasize the similarities. Perhaps we could rewrite this section as:

Channels and RemoteChannels

Both are AbstractChannels and implement fetch, put!, take!, etc.

Provide an example of the above commands

Provide an example showing how a Channel and RemoteChannel behave differently.

Tasks are explained here - http://docs.julialang.org/en/release-0.4/manual/control-flow/#man-tasks . A Channel can be used to message/communicate between tasks in the same process.

In order to send a message to a task on a different process we can use a RemoteChannel.

It occurs to me that we may eventually want to have AbstractTask which Task and RemoteTask are concrete subtypes of.

What is the API for a RemoteTask ? An Erlang style messaging any task in the cluster? Named tasks? What do you have in mind?

ViralBShah · 2020-04-30T14:47:07Z

These are overall nice and useful updates to have to the parallel docs. Now that we have refactored that section, it's a bit more work, but we should get these in.

ViralBShah · 2021-02-07T21:59:42Z

Closing as old, and many of the things here have been incorporated.

ViralBShah added the docs This change adds or pertains to documentation label Jan 1, 2016

tkelman added the parallelism Parallel or distributed computation label Jan 1, 2016

kshyatt reviewed Jan 1, 2016
View reviewed changes

ahwillia closed this Jan 4, 2016

ahwillia reopened this Jan 4, 2016

amitmurthy reviewed Jan 5, 2016
View reviewed changes

kshyatt reviewed Jan 10, 2016
View reviewed changes

update parallel docs

a855aa5

amitmurthy reviewed Jan 27, 2016
View reviewed changes

ViralBShah closed this Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parallel computing documentation #14525

Improve parallel computing documentation #14525

ahwillia commented Dec 31, 2015

ahwillia commented Jan 1, 2016

ViralBShah commented Jan 1, 2016

tshort commented Jan 1, 2016

kshyatt Jan 1, 2016

ahwillia Jan 1, 2016

amitmurthy Jan 5, 2016

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 5, 2016

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 5, 2016

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 7, 2016

ahwillia commented Jan 7, 2016

amitmurthy commented Jan 7, 2016

ahwillia commented Jan 10, 2016

kshyatt Jan 10, 2016

ahwillia commented Jan 14, 2016

iamed2 commented Jan 14, 2016

tkelman commented Jan 14, 2016

ahwillia commented Jan 14, 2016

amitmurthy commented Jan 15, 2016

ahwillia commented Jan 20, 2016

kmsquire commented Jan 21, 2016

kmsquire commented Jan 21, 2016

ahwillia commented Jan 23, 2016

amitmurthy Jan 27, 2016

ahwillia Jan 29, 2016

amitmurthy Jan 29, 2016

StefanKarpinski Jan 29, 2016

amitmurthy Jan 30, 2016

ViralBShah commented Apr 30, 2020

ViralBShah commented Feb 7, 2021

		@@ -451,36 +550,33 @@ preemptively. This means context switches only occur at well-defined
		points: in this case, when :func:`remotecall_fetch` is called.


		Channels

Improve parallel computing documentation #14525

Improve parallel computing documentation #14525

Conversation

ahwillia commented Dec 31, 2015

ahwillia commented Jan 1, 2016

ViralBShah commented Jan 1, 2016

tshort commented Jan 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 5, 2016

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 5, 2016

amitmurthy commented Jan 5, 2016

ahwillia commented Jan 7, 2016

ahwillia commented Jan 7, 2016

amitmurthy commented Jan 7, 2016

ahwillia commented Jan 10, 2016

Choose a reason for hiding this comment

ahwillia commented Jan 14, 2016

iamed2 commented Jan 14, 2016

tkelman commented Jan 14, 2016

ahwillia commented Jan 14, 2016

amitmurthy commented Jan 15, 2016

ahwillia commented Jan 20, 2016

kmsquire commented Jan 21, 2016

kmsquire commented Jan 21, 2016

ahwillia commented Jan 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ViralBShah commented Apr 30, 2020

ViralBShah commented Feb 7, 2021