Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1402] Added 3 more compression schemes #330

Closed
wants to merge 4 commits into from

Conversation

liancheng
Copy link
Contributor

JIRA issue: SPARK-1402

This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:

  • BooleanBitSet
  • IntDelta
  • LongDelta

Now there are 6 compression schemes in total, including the no-op PassThrough scheme.

Also fixed a bug in PR #286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, ColumnAccessor throws exception.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@liancheng
Copy link
Contributor Author

@marmbrus Please feel free to do your benchmark at hand once Travis and Jenkins are happy, all compression schemes are enabled by default now.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13791/

@rxin
Copy link
Contributor

rxin commented Apr 7, 2014

Hi @liancheng,

Can you take a look at this? https://issues.apache.org/jira/browse/INFRA-7544

Blocker for 1.0 release.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 7, 2014

I think @rxin meant to link to this issue: https://issues.apache.org/jira/browse/SPARK-1436

@marmbrus
Copy link
Contributor

marmbrus commented Apr 7, 2014

Note this seems broken both on master and with this PR merged.

@rxin
Copy link
Contributor

rxin commented Apr 7, 2014

Oops yes.

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Apr 7, 2014
Fix handling of empty SPARK_EXAMPLES_JAR

Currently if SPARK_EXAMPLES_JAR is left unset you get a null pointer exception when running the examples (atleast on spark on yarn).  The null now gets turned into a string of "null" when its put into the SparkConf so addJar no longer properly ignores it. This fixes that so that it can be left unset.
@liancheng
Copy link
Contributor Author

@rxin @marmbrus OK, I'm looking at it now.

…essed multiple times

Forgot to duplicate the in-memory column byte buffer when creating new ColumnAccessor's, so that when the column byte buffer is accessed multiple times, the position is not reset to 0.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@liancheng
Copy link
Contributor Author

Hi @rxin @marmbrus, fixed the bug and added a regression test. Please help to verify, thanks.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13874/

@rxin
Copy link
Contributor

rxin commented Apr 8, 2014

ok i merged this. thanks

@asfgit asfgit closed this in 0d0493f Apr 8, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402)

This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:

* `BooleanBitSet`
* `IntDelta`
* `LongDelta`

Now there are 6 compression schemes in total, including the no-op `PassThrough` scheme.

Also fixed a bug in PR apache#286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, `ColumnAccessor` throws exception.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#330 from liancheng/moreCompressionSchemes and squashes the following commits:

1d037b8 [Cheng Lian] Fixed SPARK-1436: in-memory column byte buffer must be able to be accessed multiple times
d7c0e8f [Cheng Lian] Added test suite for IntegralDelta (IntDelta & LongDelta)
3c1ad7a [Cheng Lian] Added test suite for BooleanBitSet, refactored other test suites
44fe4b2 [Cheng Lian] Refactored CompressionScheme, added 3 more compression schemes.
@liancheng liancheng deleted the moreCompressionSchemes branch September 24, 2014 00:14
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
Cache & merge scala test results, bump parallelism
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Backport commits from master to make tests pass and cleanup keypairs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants