forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PARQUET-494: Implement DictionaryEncoder and test dictionary decoding
I incorporated quite a bit of code from Impala for this patch, but did a bunch of work to get everything working. In particular, I wasn't happy with the hash table implementation in `dict-encoder.h` and so have written a simple new one that we can benchmark and tune as necessary. The simplest way to pull in the DictEncoder (PARQUET-493) was to also bring in the `MemPool` implementation, suitably trimmed down. We can continue to refactor this as needed for parquet-cpp. I also did some light refactoring using `TYPED_TEST` in `plain-encoding-test` (now `encoding-test`). Author: Wes McKinney <wesm@apache.org> Closes apache#64 from wesm/PARQUET-494 and squashes the following commits: c634abe [Wes McKinney] Refactor to create TestEncoderBase a3a563a [Wes McKinney] Consolidate dictionary encoding code 2cc4ffe [Wes McKinney] Retrieve type_length() only once in PlainDecoder ctor 20ccd9e [Wes McKinney] Remove DictionaryEncoder shim layer for now dcfc0aa [Wes McKinney] Remove redundant Int96 comparison d98a2c0 [Wes McKinney] Dictionary encoding for booleans throws exception 05414f0 [Wes McKinney] Test dictionary encoding more types 9a5b1a4 [Wes McKinney] Enable include_order linting per PARQUET-539 f3f0efc [Wes McKinney] IWYU cleaning d4191c6 [Wes McKinney] Add header installs, fix clang warning 1347b13 [Wes McKinney] Rename plain-encoding-test to encoding-test 09bf0fa [Wes McKinney] Fix bugs and add dictionary repeats 2e6af48 [Wes McKinney] Fix some bugs. FixedLenByteArray remains to get working. 69b5b69 [Wes McKinney] Refactor test fixtures to be less coupled to state. process on getting dict encoding working 6b23716 [Wes McKinney] Create reusable DataType structs for test fixtures and other compile-time type resolution matters 67883fd [Wes McKinney] Bunch of combined work for dict encoding support: Change-Id: I0fe7d47373b9da106e700381bee6538199af8a69
- Loading branch information
1 parent
cce26b7
commit bc6045b
Showing
34 changed files
with
1,839 additions
and
364 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.