SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

sryza · 2014-07-16T21:09:54Z

...s of CoGroupedRDD and PairRDDFunctions

This also removes an unnecessary tuple creation in cogroup.

rxin · 2014-07-17T00:57:59Z

core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

-        val old = map.get(k)
-        map.put(k, if (old == null) v else func(old, v))
+      iter.foreach { pair =>
+        val old = map.get(pair._1)


can we save the key into a variable here so we don't call _1 twice?

I was looking at the bytecode generated by pattern matching versus non-pattern-matching, and the main difference is that we bypass some unnecessary branches:

final def apply(x0$1: Tuple2): Unit = { case <synthetic> val x1: Tuple2 = x0$1; case4(){ if (x1.ne(null)) { val k: Int = x1._1$mcI$sp(); val v: Int = x1._2$mcI$sp(); matchEnd3({ scala.this.Predef.println(scala.Int.box(k.+(v))); scala.runtime.BoxedUnit.UNIT }) } else case5() }; case5(){ matchEnd3(throw new MatchError(x1)) }; matchEnd3(x: runtime.BoxedUnit){ () }

If we're using functional style anyway, do we really expect to see any sort of remotely noticeable gain from doing these kinds of optimizations?

It's really hard to tell. I think it is really hard to tell when doing microbenchmarks because branch prediction takes care of the extra branches. But I don't know how this impacts real runtime in a real workload without a lot more instrumentation. My opinion is since it is a small change, it's ok to just do it this way. If it is a big refactoring, that's a very different story and deserves more profiling.

Removing the call to _1 isn't hugely helpful IMO; accessor methods get inlined very fast. However, pattern matching adds a whole bunch of potential throw sites and branches and things like that, and we've noticed problems in the past.

This I'm sure has been done a million times, but one more for good luck -- here is a foreach { xy => xy._1 + xy._2 } versus map { case (x, y) => x + y }:
http://www.diffchecker.com/vtv6cptx

No new virtual function calls, but one new branch (trivially branch predictable) and one new throw (which will never be invoked), and several store/loads. (Perhaps I'm misreading the bytecode, but isn't astore_2 followed by aload_2 a no-op? Edit: I guess it does change the state of variable 2, so is not a no-op.)

I think we tested this in the past and saw a difference, perhaps because it can now throw. But I don't remember the details. Weird that it's also doing those stores.

There are all kinds of stuff that's really hard to profile at runtime. For example, the JVM might decide not to JIT the code because it is longer. It might decide not to do it because of exceptions. Branch prediction might stop working because it has too many branches. Etc.

Again, I'm only in favor of this change because it is small, and it might have impact. If it is a big ugly change, I would be against it without profiling.

Also on removing the calls to _1, these values should be in cache, so accesses will be really fast. Will hold off on these for now, but happy to make the change if y'all want.

SparkQA · 2014-07-17T07:32:56Z

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16771/consoleFull

SparkQA · 2014-07-17T08:28:02Z

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull

SparkQA · 2014-07-17T09:05:14Z

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16771/consoleFull

SparkQA · 2014-07-17T10:07:31Z

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull

mateiz · 2014-07-18T07:01:14Z

This looks okay to me as is, @rxin what do you think?

…ions of CoGroupedRDD and PairRDDFunctions

…anges in PairRDDFunctions

sryza · 2014-07-18T08:26:14Z

Upmerged

SparkQA · 2014-07-18T08:28:13Z

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull

SparkQA · 2014-07-18T10:07:36Z

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull

rxin · 2014-07-20T08:24:21Z

Merging this in master.

…ion... ...s of CoGroupedRDD and PairRDDFunctions This also removes an unnecessary tuple creation in cogroup. Author: Sandy Ryza <sandy@cloudera.com> Closes apache#1447 from sryza/sandy-spark-2519-2 and squashes the following commits: b6d9699 [Sandy Ryza] Remove missed Tuple2 match in CoGroupedRDD a109828 [Sandy Ryza] Remove another pattern matching in MappedValuesRDD and revert some changes in PairRDDFunctions be10f8a [Sandy Ryza] SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sections of CoGroupedRDD and PairRDDFunctions

…VC permission (apache#1447) ### What changes were proposed in this pull request? This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`. ### Why are the changes needed? To prevent a regression in Apache Spark 3.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case.

rxin reviewed Jul 17, 2014
View reviewed changes

sryza added 3 commits July 18, 2014 01:23

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sect…

be10f8a

…ions of CoGroupedRDD and PairRDDFunctions

Remove another pattern matching in MappedValuesRDD and revert some ch…

a109828

…anges in PairRDDFunctions

Remove missed Tuple2 match in CoGroupedRDD

b6d9699

asfgit closed this in 98ab411 Jul 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

sryza commented Jul 16, 2014

rxin Jul 17, 2014

aarondav Jul 17, 2014

rxin Jul 17, 2014

mateiz Jul 17, 2014

aarondav Jul 17, 2014

mateiz Jul 17, 2014

rxin Jul 17, 2014

sryza Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

mateiz commented Jul 18, 2014

sryza commented Jul 18, 2014

SparkQA commented Jul 18, 2014

SparkQA commented Jul 18, 2014

rxin commented Jul 20, 2014

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

Conversation

sryza commented Jul 16, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

SparkQA commented Jul 17, 2014

mateiz commented Jul 18, 2014

sryza commented Jul 18, 2014

SparkQA commented Jul 18, 2014

SparkQA commented Jul 18, 2014

rxin commented Jul 20, 2014