[SPARK-1402] Added 3 more compression schemes #330

liancheng · 2014-04-05T02:50:29Z

This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:

BooleanBitSet
IntDelta
LongDelta

Now there are 6 compression schemes in total, including the no-op PassThrough scheme.

Also fixed a bug in PR #286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, ColumnAccessor throws exception.

New schemes: BooleanBitSet, IntDelta and LongDelta

AmplabJenkins · 2014-04-05T02:52:23Z

Merged build triggered.

AmplabJenkins · 2014-04-05T02:52:29Z

Merged build started.

liancheng · 2014-04-05T03:44:05Z

@marmbrus Please feel free to do your benchmark at hand once Travis and Jenkins are happy, all compression schemes are enabled by default now.

AmplabJenkins · 2014-04-05T03:50:07Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-05T03:50:07Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13791/

rxin · 2014-04-07T19:16:13Z

Hi @liancheng,

Can you take a look at this? https://issues.apache.org/jira/browse/INFRA-7544

Blocker for 1.0 release.

marmbrus · 2014-04-07T20:51:51Z

I think @rxin meant to link to this issue: https://issues.apache.org/jira/browse/SPARK-1436

marmbrus · 2014-04-07T20:52:20Z

Note this seems broken both on master and with this PR merged.

rxin · 2014-04-07T20:52:29Z

Oops yes.

Fix handling of empty SPARK_EXAMPLES_JAR Currently if SPARK_EXAMPLES_JAR is left unset you get a null pointer exception when running the examples (atleast on spark on yarn). The null now gets turned into a string of "null" when its put into the SparkConf so addJar no longer properly ignores it. This fixes that so that it can be left unset.

liancheng · 2014-04-08T01:08:15Z

@rxin @marmbrus OK, I'm looking at it now.

…essed multiple times Forgot to duplicate the in-memory column byte buffer when creating new ColumnAccessor's, so that when the column byte buffer is accessed multiple times, the position is not reset to 0.

AmplabJenkins · 2014-04-08T02:12:23Z

Merged build triggered.

AmplabJenkins · 2014-04-08T02:12:32Z

Merged build started.

liancheng · 2014-04-08T02:13:46Z

Hi @rxin @marmbrus, fixed the bug and added a regression test. Please help to verify, thanks.

AmplabJenkins · 2014-04-08T02:53:19Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-08T02:53:19Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13874/

rxin · 2014-04-08T05:24:02Z

ok i merged this. thanks

JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402) This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage: * `BooleanBitSet` * `IntDelta` * `LongDelta` Now there are 6 compression schemes in total, including the no-op `PassThrough` scheme. Also fixed a bug in PR apache#286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, `ColumnAccessor` throws exception. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#330 from liancheng/moreCompressionSchemes and squashes the following commits: 1d037b8 [Cheng Lian] Fixed SPARK-1436: in-memory column byte buffer must be able to be accessed multiple times d7c0e8f [Cheng Lian] Added test suite for IntegralDelta (IntDelta & LongDelta) 3c1ad7a [Cheng Lian] Added test suite for BooleanBitSet, refactored other test suites 44fe4b2 [Cheng Lian] Refactored CompressionScheme, added 3 more compression schemes.

Cache & merge scala test results, bump parallelism

Backport commits from master to make tests pass and cleanup keypairs

liancheng added 3 commits April 5, 2014 10:41

Refactored CompressionScheme, added 3 more compression schemes.

44fe4b2

New schemes: BooleanBitSet, IntDelta and LongDelta

Added test suite for BooleanBitSet, refactored other test suites

3c1ad7a

Added test suite for IntegralDelta (IntDelta & LongDelta)

d7c0e8f

Fixed SPARK-1436: in-memory column byte buffer must be able to be acc…

1d037b8

…essed multiple times Forgot to duplicate the in-memory column byte buffer when creating new ColumnAccessor's, so that when the column byte buffer is accessed multiple times, the position is not reset to 0.

asfgit closed this in 0d0493f Apr 8, 2014

liancheng deleted the moreCompressionSchemes branch September 24, 2014 00:14

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018

Merge pull request apache#330 from palantir/ds/merge-test-results-scala

fc092cb

Cache & merge scala test results, bump parallelism

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#330 from liu-sheng/workaround-tests

6304a52

Backport commits from master to make tests pass and cleanup keypairs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1402] Added 3 more compression schemes #330

[SPARK-1402] Added 3 more compression schemes #330

liancheng commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

liancheng commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

rxin commented Apr 7, 2014

marmbrus commented Apr 7, 2014

marmbrus commented Apr 7, 2014

rxin commented Apr 7, 2014

liancheng commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

liancheng commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

rxin commented Apr 8, 2014

[SPARK-1402] Added 3 more compression schemes #330

[SPARK-1402] Added 3 more compression schemes #330

Conversation

liancheng commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

liancheng commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

AmplabJenkins commented Apr 5, 2014

rxin commented Apr 7, 2014

marmbrus commented Apr 7, 2014

marmbrus commented Apr 7, 2014

rxin commented Apr 7, 2014

liancheng commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

liancheng commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

AmplabJenkins commented Apr 8, 2014

rxin commented Apr 8, 2014