Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallel computing documentation #14525

Closed
wants to merge 1 commit into from
Closed

Improve parallel computing documentation #14525

wants to merge 1 commit into from

Conversation

ahwillia
Copy link
Contributor

I've been having a bit of a hard time teaching myself how to use Julia's parallel computing tools from the documentation. There are a couple open issues / pull requests to suggest I'm not alone:

So far, I've just rewritten the introductory bits of the parallel computing documentation. I am happy to iterate on this and write/re-write more sections if this is helpful.

I tried to address the following issues (from my perspective) in the introductory sections:

  • There wasn't a discussion of basic commands like nprocs, addprocs, etc.
  • Technical terms RemoteRef, Channel, RemoteChannel, etc. are introduced without being explained. Several are explained later in the documentation, so removing these early mentions cleans things up.
  • I added two section headers to orient the readers

@ahwillia
Copy link
Contributor Author

ahwillia commented Jan 1, 2016

Other potential improvements:

  • update require("count_heads") since require is deprecated done
  • pmap() example could be more explicit. result = pmap(svd, M); U1,s1,Vt1 = result[1] done
  • Any drawbacks to using SharedArrays? I don't understand some of the details in this section.
  • The put() command could be explained in more detail with an example.

@ViralBShah
Copy link
Member

Cc @amitmurthy

@ViralBShah ViralBShah added the docs This change adds or pertains to documentation label Jan 1, 2016
@tkelman tkelman added the parallelism Parallel or distributed computation label Jan 1, 2016
@tshort
Copy link
Contributor

tshort commented Jan 1, 2016

These are helpful improvements. Thanks.


Let's try this out. Starting with ``julia -p n`` provides ``n`` worker
Starting with ``julia -p n`` provides ``n`` worker
processes on the local machine. Generally it makes sense for ``n`` to
equal the number of CPU cores on the machine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mention hyperthreading here? Like, what if I have 4 physical cores but 8 hyperthreaded ones? Do I want n = 4 or n = 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, not sure I know the right answer though (or if it depends on details of the problem).

@ahwillia ahwillia closed this Jan 4, 2016
@ahwillia ahwillia reopened this Jan 4, 2016
julia> isnull(r.v) # data has been fetched
false

julia> get(r.v); # if we want to re-access the result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment # if we want to re-access the result should be removed. fetch on a Future anyways checks and returns the cached value. That fact ought to be mentioned. Folks should only use fetch and not access the fields of a reference.

@amitmurthy
Copy link
Contributor

Nice improvements. Thanks!

@ahwillia
Copy link
Contributor Author

ahwillia commented Jan 5, 2016

Thanks for the comments @amitmurthy -- I think I've addressed them with the last three commits. I can squash all of these before merging.

I think it is getting there, but I'd like to add a short, self-contained example where a vector of data is distributed across a few processes, and then have each process compute some function on the locally stored data. I'm motivated here by some applications in distributed optimization... (which is why I'm interested in learning these parallel tools better)

What do you think? Is the best way to do this with a DistributedArray? Would this example therefore belong in the DistributedArrays package documentation instead?

@amitmurthy
Copy link
Contributor

Yes, DistributedArrays is the right place. Though you can have an example here which uses SharedArrays and http://docs.julialang.org/en/latest/stdlib/parallel/#Base.localindexes to partition work among local workers.

@ahwillia
Copy link
Contributor Author

ahwillia commented Jan 5, 2016

One thing I'd like answered about SharedArrays in the documentation is where the data is actually stored... Does each worker get a copy? Is there just one array on the master process? Is it actually distributed across processes under the hood?

Sorry if this is answered somewhere, but I couldn't find this information readily.

@amitmurthy
Copy link
Contributor

SharedArrays uses system shared memory to map the same physical segments onto all participating worker processes on the same host. Changing any part of the array by any worker results in other workers seeing the same immediately.

@ahwillia
Copy link
Contributor Author

ahwillia commented Jan 7, 2016

Thanks @amitmurthy. That makes complete sense and I now see that this is explained in the very first sentence of the Shared Arrays section.

Sorry for the other newbie question: is a vanilla Array more or less stored in the same physical location as a SharedArray? The only difference being that the address to the data isn't broadcast to multiple processes?

Here are some brief/small edits to try to make these points as explicit as possible.

Shared Arrays
-------------

A :class:`SharedArray` places a single instance of an array in system shared
memory so that it can be immediately accessed and altered by many or all
processes. In contrast, a standard :class:`Array` can only be accessed and
altered by process that created it. The `DistributedArrays`_ package provides a
third type of array where each process has local access to just a chunk of the data,
and no two processes share the same chunk.

A :class:`SharedArray` is a good choice when you want to have a large amount
of data jointly accessible to two or more processes on the same machine.

... Then maybe talk about situations where using a SharedArray is a bad choice?
Is a DArray better for really large arrays? I imagine they are really necessary
for very large problems distributed across multiple machines / large clusters.
Are there other situations?

@ahwillia
Copy link
Contributor Author

ahwillia commented Jan 7, 2016

Another question I forgot to ask: Are there easy ways to turn a Array into a SharedArray? Or equivalent functions like dzeros, drand, drandn, etc.?

@amitmurthy
Copy link
Contributor

Unexported functions Base.shmem_fill and Base.shmem_fill exist as convenience constructors.

DArrays are useful when 1) the entire array does not fit into memory on a single node and 2) leveraging lots of cores in a cluster - again more than that available in a single node. Communication overhead usually dictates if a DArray is appropriate for a problem.

The need for Shared Arrays will reduce once we have multi-threading fully ironed out. Till then it is an efficient means of utilizing all cores on a host.

@ahwillia
Copy link
Contributor Author

Does these changes look good to everyone? @kshyatt -- did you have an idea for a more developed/involved example for Channels?

This section provided some basic commands to get you aquainted with setting up a
parallel environment in julia. More advanced tools for launching and managing
processes are covered in the sections on :ref:`man-clustermanagers` and
:ref:`man-net-topology`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice way of addressing the topology issue. 👍

@ahwillia
Copy link
Contributor Author

I squashed all the commits down... Unfortunately there are conflicts that need to be resolved.

@iamed2
Copy link
Contributor

iamed2 commented Jan 14, 2016

Speaking of SharedArrays documentation: #14664

@tkelman
Copy link
Contributor

tkelman commented Jan 14, 2016

Ah, sorry the conflict is #14606, which it looks like your version also addresses.

@ahwillia
Copy link
Contributor Author

@iamed2 -- I think this is a problem with SharedArrays, not the documentation? Maybe someone else could confirm?

@amitmurthy
Copy link
Contributor

Yes, that is a bug.

@ahwillia
Copy link
Contributor Author

Great seeing as that is a bug, we can proceed. @tkelman -- Can you help me resolve the merge conflict?

@kmsquire
Copy link
Member

Assuming you're using command line git:

git fetch
git rebase origin/master

This will attempt to reapply your patch to the latest master, which will fail, but all of the changes that can be applied will be. For any others, you can edit the file, looking for sections delimited by <<<<<<<<<, ==========, and >>>>>>>>>, and editing these so that the final text is the text you want.

Then do

EDIT: First

git add doc/manual/parallel-computing.rst
git commit

end EDIT

git push -f <your_github> parallel_doc

And the changes will magically appear here.

@kmsquire
Copy link
Member

(Sorry, before the last push, you need to git add and git commit (or just git commit -a) the file. Changed inline.)

@ahwillia
Copy link
Contributor Author

Thanks @kmsquire ! Should be ready to merge now unless someone objects.

@@ -451,36 +550,33 @@ preemptively. This means context switches only occur at well-defined
points: in this case, when :func:`remotecall_fetch` is called.


Channels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important retain this section and differentiate between Channels which are used for inter-task communication and RemoteChannels/Futures which are used for inter-process communication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amitmurthy - Can you spell out what is precisely meant by inter-task vs inter-process communication? I think I conflated them in what I wrote because the examples seem to emphasize the similarities. Perhaps we could rewrite this section as:

  • Channels and RemoteChannels
    • Both are AbstractChannels and implement fetch, put!, take!, etc.
    • Provide an example of the above commands
    • Provide an example showing how a Channel and RemoteChannel behave differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tasks are explained here - http://docs.julialang.org/en/release-0.4/manual/control-flow/#man-tasks . A Channel can be used to message/communicate between tasks in the same process.

In order to send a message to a task on a different process we can use a RemoteChannel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that we may eventually want to have AbstractTask which Task and RemoteTask are concrete subtypes of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the API for a RemoteTask ? An Erlang style messaging any task in the cluster? Named tasks? What do you have in mind?

@ViralBShah
Copy link
Member

These are overall nice and useful updates to have to the parallel docs. Now that we have refactored that section, it's a bit more work, but we should get these in.

@ViralBShah
Copy link
Member

Closing as old, and many of the things here have been incorporated.

@ViralBShah ViralBShah closed this Feb 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants