-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1402] Added 3 more compression schemes #330
Conversation
New schemes: BooleanBitSet, IntDelta and LongDelta
Merged build triggered. |
Merged build started. |
@marmbrus Please feel free to do your benchmark at hand once Travis and Jenkins are happy, all compression schemes are enabled by default now. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Hi @liancheng, Can you take a look at this? https://issues.apache.org/jira/browse/INFRA-7544 Blocker for 1.0 release. |
I think @rxin meant to link to this issue: https://issues.apache.org/jira/browse/SPARK-1436 |
Note this seems broken both on master and with this PR merged. |
Oops yes. |
Fix handling of empty SPARK_EXAMPLES_JAR Currently if SPARK_EXAMPLES_JAR is left unset you get a null pointer exception when running the examples (atleast on spark on yarn). The null now gets turned into a string of "null" when its put into the SparkConf so addJar no longer properly ignores it. This fixes that so that it can be left unset.
…essed multiple times Forgot to duplicate the in-memory column byte buffer when creating new ColumnAccessor's, so that when the column byte buffer is accessed multiple times, the position is not reset to 0.
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
ok i merged this. thanks |
JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402) This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage: * `BooleanBitSet` * `IntDelta` * `LongDelta` Now there are 6 compression schemes in total, including the no-op `PassThrough` scheme. Also fixed a bug in PR apache#286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, `ColumnAccessor` throws exception. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#330 from liancheng/moreCompressionSchemes and squashes the following commits: 1d037b8 [Cheng Lian] Fixed SPARK-1436: in-memory column byte buffer must be able to be accessed multiple times d7c0e8f [Cheng Lian] Added test suite for IntegralDelta (IntDelta & LongDelta) 3c1ad7a [Cheng Lian] Added test suite for BooleanBitSet, refactored other test suites 44fe4b2 [Cheng Lian] Refactored CompressionScheme, added 3 more compression schemes.
Cache & merge scala test results, bump parallelism
Backport commits from master to make tests pass and cleanup keypairs
JIRA issue: SPARK-1402
This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:
BooleanBitSet
IntDelta
LongDelta
Now there are 6 compression schemes in total, including the no-op
PassThrough
scheme.Also fixed a bug in PR #286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme,
ColumnAccessor
throws exception.