[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default #1051

ScrapCodes · 2014-06-11T11:44:16Z

No description provided.

rxin · 2014-06-12T03:59:37Z

Now I think about it, perhaps compression should be a setting for individual RDDs?

ScrapCodes · 2014-06-12T06:14:55Z

I agree !, perhaps part of Storage Levels ?

…ion by default

davies · 2014-07-11T22:59:26Z

python/pyspark/conf.py

@@ -99,6 +99,12 @@ def set(self, key, value):
        self._jconf.set(key, unicode(value))
        return self

+    def setIfMissing(self, key, value):
+        """Set a configuration property, if not already set."""
+        if self.get(key) == None:


if key not in self:

We may have to make SparkConf iterable for this to work. It does not work as is.

ah, yes. but "self.get(key) is None" sounds more Pythonic :-)

mateiz · 2014-07-23T20:09:44Z

Jenkins, test this please

SparkQA · 2014-07-23T20:13:36Z

QA tests have started for PR 1051. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17058/consoleFull

SparkQA · 2014-07-23T21:58:42Z

QA results for PR 1051:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17058/consoleFull

SparkQA · 2014-07-24T06:13:28Z

QA tests have started for PR 1051. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17104/consoleFull

SparkQA · 2014-07-24T06:55:19Z

QA results for PR 1051:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17104/consoleFull

SparkQA · 2014-07-24T07:38:44Z

QA tests have started for PR 1051. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17111/consoleFull

SparkQA · 2014-07-24T08:24:47Z

QA results for PR 1051:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17111/consoleFull

mateiz · 2014-07-25T01:16:23Z

Thanks Prashant, I've merged this.

…ion by default Author: Prashant Sharma <prashant.s@imaginea.com> Closes apache#1051 from ScrapCodes/SPARK-2014/pyspark-cache and squashes the following commits: f192df7 [Prashant Sharma] Code Review 2a2f43f [Prashant Sharma] [SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default

Taco-W · 2017-02-02T19:27:57Z

Hi guys,

Could anyone please explain why we don't want to use MEMORY_ONLY in PySpark by default?

Thanks a lot

…ization is passed (#1051) * [CARMEL-6121] Skip subsequent checks after the first authorization is passed * [CARMEL-6121] Skip subsequent checks after the first authorization is passed

…igure.sh(fix/refactor) (apache#1051)

[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compress…

2a2f43f

…ion by default

davies reviewed Jul 11, 2014
View reviewed changes

Code Review

f192df7

asfgit closed this in eff9714 Jul 25, 2014

mateiz mentioned this pull request Jul 25, 2014

[SPARK-2652] [PySpark] Turning some default configs for PySpark #1568

Closed

ScrapCodes deleted the SPARK-2014/pyspark-cache branch June 3, 2015 06:01

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via conf…

388c0de

…igure.sh(fix/refactor) (apache#1051)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default #1051

[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default #1051

ScrapCodes commented Jun 11, 2014

rxin commented Jun 12, 2014

ScrapCodes commented Jun 12, 2014

davies Jul 11, 2014

ScrapCodes Jul 24, 2014

davies Jul 24, 2014

mateiz commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

mateiz commented Jul 25, 2014

Taco-W commented Feb 2, 2017 •

edited

Loading

[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default #1051

[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by default #1051

Conversation

ScrapCodes commented Jun 11, 2014

rxin commented Jun 12, 2014

ScrapCodes commented Jun 12, 2014

davies Jul 11, 2014

Choose a reason for hiding this comment

ScrapCodes Jul 24, 2014

Choose a reason for hiding this comment

davies Jul 24, 2014

Choose a reason for hiding this comment

mateiz commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 23, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

SparkQA commented Jul 24, 2014

mateiz commented Jul 25, 2014

Taco-W commented Feb 2, 2017 • edited Loading

Taco-W commented Feb 2, 2017 •

edited

Loading