Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21305][ML][MLLIB]Add options to disable multi-threading of native BLAS #18551

Closed
wants to merge 7 commits into from

Conversation

mpjlu
Copy link

@mpjlu mpjlu commented Jul 6, 2017

What changes were proposed in this pull request?

Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to improvement the performance.
Many popular Native BLAS, like Intel MKL, OpenBLAS, use multi-threading technology, which will conflict with Spark. Spark should provide options to disable multi-threading of Native BLAS.

https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications

How was this patch tested?

The existing UT.

@srowen
Copy link
Member

srowen commented Jul 6, 2017

Hm, can we try to set this with System.setenv anywhere in the code or will it be too late by the time this triggers? probably.

I think this is OK, but should be documented also in places in the docs that cover BLAS and netlib.

@mpjlu
Copy link
Author

mpjlu commented Jul 6, 2017

Thanks, @srowen . I have updated the doc.
I also validated the current option in spark-env.sh, it works.
Thanks.

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79272 has finished for PR 18551 at commit d0d94e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

docs/ml-guide.md Outdated
@@ -61,6 +61,9 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), OpenBLAS, are based on multi-threading, which will conflict with Spark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need more detail on "conflict with Spark". Explain how exactly

@@ -61,3 +61,7 @@
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: 0)
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You shoud enable these options if use native BLAS (SPARK-21305).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if use" -> "if using"

"(see SPARK-21305)"

docs/ml-guide.md Outdated
@@ -61,6 +61,9 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), OpenBLAS, are based on multi-threading, which will conflict with Spark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link for OpenBLAS?

docs/ml-guide.md Outdated
@@ -61,6 +61,9 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), OpenBLAS, are based on multi-threading, which will conflict with Spark.
To use multi-threading based native BLAS, you must set it to use single thread first (SPARK-21305).
Copy link
Contributor

@MLnick MLnick Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "If using a native BLAS based on multi-threading, for best performance you must ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a URL link for SPARK-21305

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79276 has finished for PR 18551 at commit adfa6f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79277 has finished for PR 18551 at commit 08e900b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79278 has finished for PR 18551 at commit 7ef8b80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 6, 2017

Test build #79289 has finished for PR 18551 at commit a4d4f50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

docs/ml-guide.md Outdated
@@ -61,6 +61,11 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), are based on multi-threading.
For example, when OpenBLAS is loaded, it will create a thread pool with `MAX_CPU_NUMBER` threads, and the threads are using spinlock by default, which will conflict with Spark.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always worse? for example can it multi-thread a single computation? it's possible that's advantageous if the other tasks on the machine aren't CPU-intensive. But probably counterproductive if all the tasks are CPU intensive.

Maybe we can soften the language slightly, to say you might get better performance by setting these to 1.

Also I don't think users will know what spinlock or MAX_CPU_NUMBER is (not a Spark value?)

Also, I don't think we ever explain here what MKL or OpenBLAS is, in any docs. You might briefly mention that this is explained in the netlib docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung
Copy link
Member

shouldn't we not suggest to disable multithreading completely but instead say to make sure the number of core assigned to Spark and number of threads to native BLAS add up to the total number of CPU core available on the worker?

@mpjlu
Copy link
Author

mpjlu commented Jul 10, 2017

hi @felixcheung , I have tested one case, write a single thread java program, and call native blas. The performance is much better to disable native blas multi-threading (the total program performance). This is because JVM itself is multi-threading.
For JVM env, maybe we cannot set "the number of core assigned to Spark and number of threads to native BLAS add up to the total number of CPU core available on the worker". I didn't do enough test for this.
Thanks.

@srowen
Copy link
Member

srowen commented Jul 10, 2017

@mpjlu Felix is making a slightly different point. Let's say Spark is actually configured to use 2 CPUs per task. The right setting is not 1 thread per task but 2. Using 1 would almost surely not be faster.

@mpjlu
Copy link
Author

mpjlu commented Jul 10, 2017

hi @srowen , I understand Felix's point. I mean if you only have 1 task in C/C++, and 2 CPUs, setting native BLAS to use 2 CPUs will be faster. But in JVM env, even you only have one task, and 2 CPUs, but there is still some JVM system threads, setting native BLAS to use 2 CPUs maybe not faster.

@srowen
Copy link
Member

srowen commented Jul 10, 2017

I doubt the other background tasks will consume a whole additional CPU for every task. You're saying that on a 16-core machine with 8 tasks, the other 8 CPUs are probably nearly fully utilized? I just can't see it. In any event, I think that's why it's best to state this as something to look at, and the principle here -- match MKL/BLAS cores to task cores -- rather than say "always set to 1".

@mpjlu
Copy link
Author

mpjlu commented Jul 10, 2017

Hi @srowen , Thanks very much for your review.
I will revise the document of this PR to soften the language.
According to my profiling data, I guess, when the native BLAS is loaded (or when a multi-threading subprogram is called), the native BLAS it will create a thread pool (If you have 16 cores, by default it will create 15 threads, these threads will exist until the end of the job, not the same life time as subprogram, but the same life time as your job), this 15 threads each will consume a whole CPU.
Thanks.

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79506 has finished for PR 18551 at commit 141d92d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mpjlu
Copy link
Author

mpjlu commented Jul 11, 2017

retest this please

docs/ml-guide.md Outdated
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), are based on multi-threading, which will conflict with Spark.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "can use multiple threads in a single operation, which can conflict with Spark's execution model."

docs/ml-guide.md Outdated
@@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include
project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your
platform's additional installation instructions.

The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), are based on multi-threading, which will conflict with Spark.

If using a native BLAS based on multi-threading, you might get better performance to set it to use single thread first ([SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might say, "Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see ...). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1."

docs/ml-guide.md Outdated

If using a native BLAS based on multi-threading, you might get better performance to set it to use single thread first ([SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)).

Please reference the recommended settings for [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: "Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: ..."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks very much. @srowen

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79513 has finished for PR 18551 at commit 141d92d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79521 has finished for PR 18551 at commit e01a613.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mpjlu
Copy link
Author

mpjlu commented Jul 11, 2017

retest this please

@SparkQA
Copy link

SparkQA commented Jul 11, 2017

Test build #79525 has finished for PR 18551 at commit e01a613.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 12, 2017

Merged to master

@asfgit asfgit closed this in 5ed134e Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants