Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section... #1447

Closed
wants to merge 3 commits into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Jul 16, 2014

...s of CoGroupedRDD and PairRDDFunctions

This also removes an unnecessary tuple creation in cogroup.

val old = map.get(k)
map.put(k, if (old == null) v else func(old, v))
iter.foreach { pair =>
val old = map.get(pair._1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we save the key into a variable here so we don't call _1 twice?

I was looking at the bytecode generated by pattern matching versus non-pattern-matching, and the main difference is that we bypass some unnecessary branches:

    final def apply(x0$1: Tuple2): Unit = {
      case <synthetic> val x1: Tuple2 = x0$1;
      case4(){
        if (x1.ne(null))
          {
            val k: Int = x1._1$mcI$sp();
            val v: Int = x1._2$mcI$sp();
            matchEnd3({
              scala.this.Predef.println(scala.Int.box(k.+(v)));
              scala.runtime.BoxedUnit.UNIT
            })
          }
        else
          case5()
      };
      case5(){
        matchEnd3(throw new MatchError(x1))
      };
      matchEnd3(x: runtime.BoxedUnit){
        ()
      }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using functional style anyway, do we really expect to see any sort of remotely noticeable gain from doing these kinds of optimizations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really hard to tell. I think it is really hard to tell when doing microbenchmarks because branch prediction takes care of the extra branches. But I don't know how this impacts real runtime in a real workload without a lot more instrumentation. My opinion is since it is a small change, it's ok to just do it this way. If it is a big refactoring, that's a very different story and deserves more profiling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the call to _1 isn't hugely helpful IMO; accessor methods get inlined very fast. However, pattern matching adds a whole bunch of potential throw sites and branches and things like that, and we've noticed problems in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I'm sure has been done a million times, but one more for good luck -- here is a foreach { xy => xy._1 + xy._2 } versus map { case (x, y) => x + y }:
http://www.diffchecker.com/vtv6cptx

No new virtual function calls, but one new branch (trivially branch predictable) and one new throw (which will never be invoked), and several store/loads. (Perhaps I'm misreading the bytecode, but isn't astore_2 followed by aload_2 a no-op? Edit: I guess it does change the state of variable 2, so is not a no-op.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we tested this in the past and saw a difference, perhaps because it can now throw. But I don't remember the details. Weird that it's also doing those stores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are all kinds of stuff that's really hard to profile at runtime. For example, the JVM might decide not to JIT the code because it is longer. It might decide not to do it because of exceptions. Branch prediction might stop working because it has too many branches. Etc.

Again, I'm only in favor of this change because it is small, and it might have impact. If it is a big ugly change, I would be against it without profiling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on removing the calls to _1, these values should be in cache, so accesses will be really fast. Will hold off on these for now, but happy to make the change if y'all want.

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16771/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16771/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 18, 2014

This looks okay to me as is, @rxin what do you think?

@sryza
Copy link
Contributor Author

sryza commented Jul 18, 2014

Upmerged

@SparkQA
Copy link

SparkQA commented Jul 18, 2014

QA tests have started for PR 1447. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 18, 2014

QA results for PR 1447:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16821/consoleFull

@rxin
Copy link
Contributor

rxin commented Jul 20, 2014

Merging this in master.

@asfgit asfgit closed this in 98ab411 Jul 20, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…ion...

...s of CoGroupedRDD and PairRDDFunctions

This also removes an unnecessary tuple creation in cogroup.

Author: Sandy Ryza <sandy@cloudera.com>

Closes apache#1447 from sryza/sandy-spark-2519-2 and squashes the following commits:

b6d9699 [Sandy Ryza] Remove missed Tuple2 match in CoGroupedRDD
a109828 [Sandy Ryza] Remove another pattern matching in MappedValuesRDD and revert some changes in PairRDDFunctions
be10f8a [Sandy Ryza] SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sections of CoGroupedRDD and PairRDDFunctions
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…VC permission (apache#1447)

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants