Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1678][SPARK-1679] In-memory compression bug fix and made compression configurable, disabled by default #608

Closed
wants to merge 4 commits into from

Conversation

liancheng
Copy link
Contributor

In-memory compression is now configurable in SparkConf by the spark.sql.inMemoryCompression.enabled property, and is disabled by default.

To help code review, the bug fix is in the first commit, compression configuration is in the second one.

CompressibleColumnAccessor.hasNext and RunLengthEncoding.decoder.hasNext were not correctly implemented.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14601/

@liancheng
Copy link
Contributor Author

Notice that the RLDecoder in Shark has the same bug fixed in RunLengthEncoding.decoder, and may lose repeated values. @rxin

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14658/

@marmbrus
Copy link
Contributor

marmbrus commented May 5, 2014

This looks great! Thanks for doing this (and even splitting the commits into nice chunks).

One question is about the naming of the config option. I'm thinking something more like spark.sql.inMemoryColumnarStorage.compressed. That way we have a separate group for all "inMemoryColumnarStorage" things and it clear which version this is if we add another in-memory option. Open to other suggestions though.

@pwendell, we should try to include this in 1.0 if possible.

@liancheng
Copy link
Contributor Author

Done :)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14693/

@pwendell
Copy link
Contributor

pwendell commented May 6, 2014

Thanks - I can merge this.

@asfgit asfgit closed this in 6d721c5 May 6, 2014
asfgit pushed a commit that referenced this pull request May 6, 2014
…ession configurable, disabled by default

In-memory compression is now configurable in `SparkConf` by the `spark.sql.inMemoryCompression.enabled` property, and is disabled by default.

To help code review, the bug fix is in [the first commit](liancheng@d537a36), compression configuration is in [the second one](liancheng@4ce09aa).

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #608 from liancheng/spark-1678 and squashes the following commits:

66c3a8d [Cheng Lian] Renamed in-memory compression configuration key
f8fb3a0 [Cheng Lian] Added assertion for testing .hasNext of various decoder
4ce09aa [Cheng Lian] Made in-memory compression configurable via SparkConf
d537a36 [Cheng Lian] Fixed SPARK-1678
(cherry picked from commit 6d721c5)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…ession configurable, disabled by default

In-memory compression is now configurable in `SparkConf` by the `spark.sql.inMemoryCompression.enabled` property, and is disabled by default.

To help code review, the bug fix is in [the first commit](liancheng@d537a36), compression configuration is in [the second one](liancheng@4ce09aa).

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#608 from liancheng/spark-1678 and squashes the following commits:

66c3a8d [Cheng Lian] Renamed in-memory compression configuration key
f8fb3a0 [Cheng Lian] Added assertion for testing .hasNext of various decoder
4ce09aa [Cheng Lian] Made in-memory compression configurable via SparkConf
d537a36 [Cheng Lian] Fixed SPARK-1678
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 17, 2014
Author: Andrew Ash <andrew@andrewash.com>

Closes apache#608 from ash211/patch-7 and squashes the following commits:

bd85f2a [Andrew Ash] Worker registration logging fix

(cherry picked from commit c0795cf)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
@liancheng liancheng deleted the spark-1678 branch September 24, 2014 00:14
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
Author: Andrew Ash <andrew@andrewash.com>

Closes apache#608 from ash211/patch-7 and squashes the following commits:

bd85f2a [Andrew Ash] Worker registration logging fix

(cherry picked from commit c0795cf)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
rvesse pushed a commit to rvesse/spark that referenced this pull request Mar 2, 2018
* Create ISSUE_TEMPLATE.md

* add dev mailing list and jira links
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants