Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once #22521

Closed
wants to merge 5 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Sep 21, 2018

What changes were proposed in this pull request?

Previously SPARK-24519 created a modifiable config SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS. However, the config is being parsed for every creation of MapStatus, which could be very expensive. Another problem with the previous approach is that it created the illusion that this can be changed dynamically at runtime, which was not true. This PR changes it so the config is computed only once.

How was this patch tested?

Removed a test case that's no longer valid.

@SparkQA
Copy link

SparkQA commented Sep 21, 2018

Test build #96449 has finished for PR 22521 at commit f6f9658.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2018

Test build #96495 has finished for PR 22521 at commit 4af98e7.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin changed the title [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once - WIP [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once Sep 24, 2018
@rxin
Copy link
Contributor Author

rxin commented Sep 24, 2018

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 24, 2018

Test build #4350 has started for PR 22521 at commit 4af98e7.

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

@rxin . Could you fill the PR description, too?

@rxin
Copy link
Contributor Author

rxin commented Sep 25, 2018

yup; just did

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
+1, LGTM.

@dongjoon-hyun
Copy link
Member

nit. Could you add [CORE] to the PR title, too?

@rxin rxin changed the title [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS only once Sep 25, 2018
@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96534 has finished for PR 22521 at commit 4af98e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96538 has finished for PR 22521 at commit 4af98e7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 25, 2018

Test build #96554 has finished for PR 22521 at commit 4af98e7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Sep 25, 2018

seems like our tests are really flaky

@cloud-fan
Copy link
Contributor

retest this please

@dongjoon-hyun
Copy link
Member

Ya. It's amazing to see 5 different failures in the same PR. :(

  1. org.apache.spark.scheduler.MapStatusSuite.SPARK-24519: HighlyCompressedMapStatus has configurable threshold
  2. This patch fails SparkR unit tests.
  3. This patch fails due to an unknown error code, -9.
  4. This patch fails PySpark unit tests.
  5. org.apache.spark.sql.streaming.continuous.ContinuousSuite.query without test harness

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96571 has finished for PR 22521 at commit 4af98e7.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96585 has finished for PR 22521 at commit 4af98e7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96584 has finished for PR 22521 at commit 4af98e7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #4352 has finished for PR 22521 at commit 4af98e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #96612 has finished for PR 22521 at commit 4af98e7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2018

Test build #4353 has finished for PR 22521 at commit 4af98e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Merged to master/branch-2.4.

@asfgit asfgit closed this in e702fb1 Sep 26, 2018
asfgit pushed a commit that referenced this pull request Sep 26, 2018
…only once

## What changes were proposed in this pull request?
Previously SPARK-24519 created a modifiable config SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS. However, the config is being parsed for every creation of MapStatus, which could be very expensive. Another problem with the previous approach is that it created the illusion that this can be changed dynamically at runtime, which was not true. This PR changes it so the config is computed only once.

## How was this patch tested?
Removed a test case that's no longer valid.

Closes #22521 from rxin/SPARK-24519.

Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit e702fb1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
daspalrahul pushed a commit to daspalrahul/spark that referenced this pull request Sep 29, 2018
…only once

## What changes were proposed in this pull request?
Previously SPARK-24519 created a modifiable config SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS. However, the config is being parsed for every creation of MapStatus, which could be very expensive. Another problem with the previous approach is that it created the illusion that this can be changed dynamically at runtime, which was not true. This PR changes it so the config is computed only once.

## How was this patch tested?
Removed a test case that's no longer valid.

Closes apache#22521 from rxin/SPARK-24519.

Authored-by: Reynold Xin <rxin@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants