Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15238: [C++] ARROW_ENGINE module with substrait consumer #12279

Closed
wants to merge 116 commits into from
Closed
Show file tree
Hide file tree
Changes from 102 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
ff51e35
first pass at using substrait protobufs
bkietz Nov 11, 2021
906aacf
add conversion of types and a basic roundtrip test
bkietz Nov 12, 2021
5fc6c34
reorganize to engine/substrait/{proto}_internal.cc,h
bkietz Nov 15, 2021
151cf72
add basic literal conversions
bkietz Nov 15, 2021
1cf679b
don't rely on Datum::type in DataEq matcher
bkietz Nov 16, 2021
6dce17b
add substrait_gen_verify, allow configurable substrait repo and tag
bkietz Nov 18, 2021
53c7c7d
Expose NamedStruct<=>Schema serde
bkietz Nov 19, 2021
cd919f3
add if_else <-> IfThen conversion
bkietz Nov 22, 2021
c079359
rebase, catch up to changes in substrait
bkietz Nov 30, 2021
5c69fb1
finish catching up with substrait's new field references
bkietz Dec 1, 2021
2a77d28
add SubstraitFromJSON
bkietz Dec 1, 2021
a5c25de
port more tests to JSON
bkietz Dec 3, 2021
873ec23
add more DataEq matchers
bkietz Dec 6, 2021
05be909
use Date64 for substrait::date
bkietz Dec 6, 2021
7309964
add extension types for interval_*, support deeply nested struct fiel…
bkietz Dec 6, 2021
27af6b6
add basic sketch of ExtensionSet for tracking substrait extensions
bkietz Dec 9, 2021
37a62de
get a failing test for arrow::null
bkietz Dec 13, 2021
8397ee5
refactor extension types to index variations alongside types
bkietz Dec 15, 2021
61feb19
use an ExtensionSet-local registry
bkietz Dec 15, 2021
1eb4bc0
add ExtensionSet <-> Plan factories
bkietz Dec 17, 2021
e63da1e
pre merge stash
bkietz Jan 10, 2022
f37084a
post rebase cleanup
bkietz Jan 10, 2022
2725ed7
Changes needed to get branch to compile
jvanstraten Jan 11, 2022
11e7f3f
Fix uninitialized pointers
jvanstraten Jan 11, 2022
96727e1
Fix FieldRef/StructField order
jvanstraten Jan 11, 2022
07b259c
gcc: use a pointer to the properties tuple instead of a reference
bkietz Jan 11, 2022
885836a
advance substrait version
bkietz Jan 11, 2022
0b94f3a
make JSON utils public, add CheckMessagesEquivalent()
bkietz Jan 11, 2022
98c74a8
Revert now unnecessary part of 9255fb6
jvanstraten Jan 12, 2022
6fd2e73
Support nested StructFields
jvanstraten Jan 12, 2022
0034651
Support struct_field compute function
jvanstraten Jan 12, 2022
b5e6fa4
Use ReferenceSegment.child where possible when emitting Substrait
jvanstraten Jan 12, 2022
e7184f5
Use CheckMessagesEquivalent() for test
jvanstraten Jan 12, 2022
bf93511
Fix compilation with googletest 1.11
jvanstraten Jan 12, 2022
2573bc5
add nullable field roundtrip test
bkietz Jan 12, 2022
095560f
Use lowercase nullptr in cc files
jvanstraten Jan 13, 2022
24517ff
Remove redundant else block
jvanstraten Jan 13, 2022
c4d9877
Fix clang-format'ing
jvanstraten Jan 13, 2022
f2e0e71
Simplify Fingerprintable constructor
jvanstraten Jan 13, 2022
b09d372
Simplify unique_ptr moves and casts
jvanstraten Jan 13, 2022
1557f4f
Minor fixes in suggested changes
jvanstraten Jan 13, 2022
839826b
Add tests for mixed struct references and expressions
jvanstraten Jan 13, 2022
d12545f
clean up internal::
bkietz Jan 13, 2022
9dd9c70
revert Fingerprintable change
bkietz Jan 13, 2022
9a569c9
add a simple example of substrait consumption
bkietz Jan 14, 2022
ee32bb5
add sketch of Relation conversion
bkietz Jan 14, 2022
cf33bd1
WIP on case_when support
jvanstraten Jan 13, 2022
2fc0123
Fully implement case_when(make_struct(...), ...)
jvanstraten Jan 14, 2022
9468920
Simplify ReferenceSegment manipulation functions
jvanstraten Jan 18, 2022
4da0939
add test for ReadRel conversion
bkietz Jan 19, 2022
1487b37
add function extensions to ExtensionSet
bkietz Jan 20, 2022
d237377
Add a test for extraction of an ExtensionSet from a Plan
bkietz Jan 20, 2022
5e0d6e5
add a roundtrip test for calling an extension function
bkietz Jan 20, 2022
cd22ef0
repair status_test::MatcherExplanations
bkietz Jan 21, 2022
1dc4c9d
removing old generated extension files
bkietz Jan 21, 2022
ed5b0d5
unity: ensure globals are unique within a TU
bkietz Jan 21, 2022
5d61724
ensure generated files are also excluded from lint_cpp_cli
bkietz Jan 21, 2022
37b5673
substrait consumer api cleanup
bkietz Jan 21, 2022
5b362f9
ensure generated files are also excluded from rat
bkietz Jan 21, 2022
b1a9bba
put actual json in engine_substrait_consumption.cc
bkietz Jan 21, 2022
b6499ae
msvc: suppress C4251 (needs dll-interface)
bkietz Jan 21, 2022
1473dc2
remove duplicate ARROW_ENGINE option
bkietz Jan 21, 2022
107c5b7
ensure ARROW_DATASET is available if ARROW_ENGINE=on
bkietz Jan 21, 2022
5012be6
try to localize suppressions, try defining LIBPROTOBUF_EXPORTS
bkietz Jan 21, 2022
2c73334
run cmake-format
bkietz Jan 21, 2022
65c2ee8
msvc: just suppress C4251, more int/size_t fixes
bkietz Jan 21, 2022
6d8ebc7
msvc: one more int/size_t fix
bkietz Jan 21, 2022
adfe196
msvc: one more int/size_t fix
bkietz Jan 21, 2022
0a965c5
use libprotobuf 3.19
bkietz Jan 21, 2022
ecb5214
Remove files generated by protoc
jvanstraten Jan 27, 2022
aa0e94b
Split ARROW_SUBSTRAIT_REPO_AND_TAG option
jvanstraten Feb 1, 2022
584b49e
Reorganize Arrow/Substrait example
jvanstraten Feb 1, 2022
ba74f7d
Remove redundant :: namespace prefixes in Arrow/Substrait example
jvanstraten Feb 1, 2022
f1e6259
Clarify comment in Arrow/Substrait example
jvanstraten Feb 1, 2022
283e87b
Document IgnoringConsumer::tag_ in Arrow/Substrait example
jvanstraten Feb 1, 2022
d64d53b
Clarify comment in Arrow/Substrait example
jvanstraten Feb 1, 2022
e730309
Clarify Arrow/Substrait example; remove unnecessary implementation de…
jvanstraten Feb 1, 2022
2901721
Clarify comment in Arrow/Substrait example
jvanstraten Feb 1, 2022
96aebea
Simplify includes for Arrow/Substrait example
jvanstraten Feb 1, 2022
a8dd172
Revert error message
jvanstraten Feb 1, 2022
1404f71
Remove duplicate definition of u32
jvanstraten Feb 1, 2022
4b92961
Minor improvements to expression_internal.cc based on code review
jvanstraten Feb 1, 2022
a9862a1
Fix std::string(n,c) argument order in Substrait example
jvanstraten Feb 2, 2022
4b55dee
Avoid shared_ptr copy, fix move -> forward in datum.h
jvanstraten Feb 2, 2022
7f81492
Add convenience constructors from std::string to *BinaryScalars
jvanstraten Feb 2, 2022
a284e18
Map Substrait date to Arrow Date32 instead of Date64
jvanstraten Feb 2, 2022
d099775
Clarify lack of struct field names in Substrait in comment as requested
jvanstraten Feb 2, 2022
bdb0b74
Slightly improve Substrait to Arrow conversion error messages
jvanstraten Feb 2, 2022
9453707
Add docstrings to functions defined in serde.h
jvanstraten Feb 2, 2022
c58b9c6
Add missing NotImplemented checks for Substrait FileOrFiles message
jvanstraten Feb 2, 2022
fc3d480
Remove unused associated type
jvanstraten Feb 2, 2022
2da3023
Fix parameter name links in docstrings added in 9453707
jvanstraten Feb 2, 2022
231f95a
ARROW_ENGINE currently depends on ARROW_PARQUET (only place it can ge…
jvanstraten Feb 3, 2022
e7ec32d
Fix CMake module style violations
jvanstraten Feb 3, 2022
cbc9842
Fix correspondence between extension_types.yaml and extension_set.cc,…
jvanstraten Feb 3, 2022
736c7a6
Fix make_shared<T> where T = universal reference
jvanstraten Feb 3, 2022
15fdc58
Clarify comment
jvanstraten Feb 3, 2022
22ad42f
Remove unnecessary imports in extension_types.cc
jvanstraten Feb 3, 2022
bae1822
Document plan_internal.h
jvanstraten Feb 3, 2022
6240ca2
Disable ARROW_ENGINE in CI job where ARROW_PARQUET doesn't work (whic…
jvanstraten Feb 4, 2022
b8ff996
Improve docstring
jvanstraten Feb 7, 2022
f4b351f
Remove usage of scan_options->projection
jvanstraten Feb 7, 2022
709ab65
Remove unused test helper function
jvanstraten Feb 8, 2022
c8e93ea
Add test for round-tripping Substrait Map<K,V>, and fix implementation
jvanstraten Feb 8, 2022
fc9d1a9
Added doc strings for SimpleExtensionType
westonpace Feb 9, 2022
8bd00a2
Made a simplification pass on the extension set. I added some commen…
westonpace Feb 9, 2022
2156716
Remove questionable SubstraitToJson & SubstraitFromJson test helper f…
jvanstraten Feb 9, 2022
f5483f4
Removing unneeded protobuf cmake flag
westonpace Feb 10, 2022
1b8ef91
Fix up the substrait consumption example to take in a filename instea…
westonpace Feb 11, 2022
78d0fca
Update cpp/src/arrow/engine/substrait/extension_set.h
westonpace Feb 11, 2022
2b0131c
Cleaned up comments per PR review
westonpace Feb 11, 2022
cde0b6f
Merge pull request #1 from westonpace/substrait-consumer
jvanstraten Feb 11, 2022
a4a5f7a
Merge pull request #2 from westonpace/substrait-consumer-build-fix
jvanstraten Feb 11, 2022
b7a2a1e
Fix lint violation
jvanstraten Feb 11, 2022
90d7ae3
If we build protobuf from source we always build it statically. We n…
westonpace Feb 11, 2022
fe1f784
Merge pull request #3 from westonpace/substrait-consumer-build-fix-2
jvanstraten Feb 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ jobs:
-e ARROW_GCS=OFF
-e ARROW_MIMALLOC=OFF
-e ARROW_ORC=OFF
-e ARROW_ENGINE=OFF
-e ARROW_PARQUET=OFF
-e ARROW_S3=OFF
-e CMAKE_UNITY_BUILD=ON
Expand Down
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,9 @@ if(ARROW_CUDA
endif()

if(ARROW_ENGINE)
set(ARROW_PARQUET ON)
set(ARROW_COMPUTE ON)
set(ARROW_DATASET ON)
endif()

if(ARROW_SKYHOOK)
Expand Down
12 changes: 11 additions & 1 deletion cpp/cmake_modules/DefineOptions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

define_option(ARROW_DATASET "Build the Arrow Dataset Modules" OFF)

define_option(ARROW_ENGINE "Build the Arrow Execution Engine" OFF)
define_option(ARROW_ENGINE "Build the Arrow Query Engine Module" OFF)
westonpace marked this conversation as resolved.
Show resolved Hide resolved

define_option(ARROW_FILESYSTEM "Build the Arrow Filesystem Layer" OFF)

Expand Down Expand Up @@ -478,6 +478,16 @@ advised that if this is enabled 'install' will fail silently on components;\
that have not been built"
OFF)

set(ARROW_SUBSTRAIT_REPO_DEFAULT "https://github.com/substrait-io/substrait")
define_option_string(ARROW_SUBSTRAIT_REPO
"Custom git repository URL for downloading Substrait sources.;\
See also ARROW_SUBSTRAIT_TAG" "${ARROW_SUBSTRAIT_REPO_DEFAULT}")

set(ARROW_SUBSTRAIT_TAG_DEFAULT "e1b4c04a1b518912f4c4065b16a1b2c0ac8e14cf")
define_option_string(ARROW_SUBSTRAIT_TAG
"Custom git hash/tag/branch for Substrait repository.;\
See also ARROW_SUBSTRAIT_REPO" "${ARROW_SUBSTRAIT_TAG_DEFAULT}")

option(ARROW_BUILD_CONFIG_SUMMARY_JSON "Summarize build configuration in a JSON file"
ON)
endif()
Expand Down
88 changes: 88 additions & 0 deletions cpp/cmake_modules/FindArrowEngine.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# - Find Arrow Engine (arrow/engine/api.h, libarrow_engine.a, libarrow_engine.so)
#
# This module requires Arrow from which it uses
# arrow_find_package()
#
# This module defines
# ARROW_ENGINE_FOUND, whether Arrow Engine has been found
# ARROW_ENGINE_IMPORT_LIB,
# path to libarrow_engine's import library (Windows only)
# ARROW_ENGINE_INCLUDE_DIR, directory containing headers
# ARROW_ENGINE_LIB_DIR, directory containing Arrow Engine libraries
# ARROW_ENGINE_SHARED_LIB, path to libarrow_engine's shared library
# ARROW_ENGINE_STATIC_LIB, path to libarrow_engine.a

if(DEFINED ARROW_ENGINE_FOUND)
return()
endif()

set(find_package_arguments)
if(${CMAKE_FIND_PACKAGE_NAME}_FIND_VERSION)
list(APPEND find_package_arguments "${${CMAKE_FIND_PACKAGE_NAME}_FIND_VERSION}")
endif()
if(${CMAKE_FIND_PACKAGE_NAME}_FIND_REQUIRED)
list(APPEND find_package_arguments REQUIRED)
endif()
if(${CMAKE_FIND_PACKAGE_NAME}_FIND_QUIETLY)
list(APPEND find_package_arguments QUIET)
endif()
find_package(Arrow ${find_package_arguments})
find_package(Parquet ${find_package_arguments})

if(ARROW_FOUND AND PARQUET_FOUND)
arrow_find_package(ARROW_ENGINE
"${ARROW_HOME}"
arrow_engine
arrow/engine/api.h
ArrowEngine
arrow-engine)
if(NOT ARROW_ENGINE_VERSION)
set(ARROW_ENGINE_VERSION "${ARROW_VERSION}")
endif()
endif()

if("${ARROW_ENGINE_VERSION}" VERSION_EQUAL "${ARROW_VERSION}")
set(ARROW_ENGINE_VERSION_MATCH TRUE)
else()
set(ARROW_ENGINE_VERSION_MATCH FALSE)
endif()

mark_as_advanced(ARROW_ENGINE_IMPORT_LIB
ARROW_ENGINE_INCLUDE_DIR
ARROW_ENGINE_LIBS
ARROW_ENGINE_LIB_DIR
ARROW_ENGINE_SHARED_IMP_LIB
ARROW_ENGINE_SHARED_LIB
ARROW_ENGINE_STATIC_LIB
ARROW_ENGINE_VERSION
ARROW_ENGINE_VERSION_MATCH)

find_package_handle_standard_args(
ArrowEngine
REQUIRED_VARS ARROW_ENGINE_INCLUDE_DIR ARROW_ENGINE_LIB_DIR ARROW_ENGINE_VERSION_MATCH
VERSION_VAR ARROW_ENGINE_VERSION)
set(ARROW_ENGINE_FOUND ${ArrowEngine_FOUND})

if(ArrowEngine_FOUND AND NOT ArrowEngine_FIND_QUIETLY)
message(STATUS "Found the Arrow Engine by ${ARROW_ENGINE_FIND_APPROACH}")
message(STATUS "Found the Arrow Engine shared library: ${ARROW_ENGINE_SHARED_LIB}")
message(STATUS "Found the Arrow Engine import library: ${ARROW_ENGINE_IMPORT_LIB}")
message(STATUS "Found the Arrow Engine static library: ${ARROW_ENGINE_STATIC_LIB}")
endif()
3 changes: 2 additions & 1 deletion cpp/cmake_modules/ThirdpartyToolchain.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,8 @@ endif()

if(ARROW_ORC
OR ARROW_FLIGHT
OR ARROW_GANDIVA)
OR ARROW_GANDIVA
OR ARROW_ENGINE)
set(ARROW_WITH_PROTOBUF ON)
endif()

Expand Down
4 changes: 4 additions & 0 deletions cpp/examples/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ if(ARROW_COMPUTE)
add_arrow_example(compute_register_example)
endif()

if(ARROW_ENGINE)
add_arrow_example(engine_substrait_consumption EXTRA_LINK_LIBS arrow_engine_shared)
endif()

if(ARROW_COMPUTE AND ARROW_CSV)
add_arrow_example(compute_and_write_csv_example)
endif()
Expand Down
188 changes: 188 additions & 0 deletions cpp/examples/arrow/engine_substrait_consumption.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include <arrow/api.h>
#include <arrow/compute/api.h>
#include <arrow/engine/api.h>

#include <cstdlib>
#include <iostream>
#include <memory>
#include <vector>

namespace eng = arrow::engine;
namespace cp = arrow::compute;

#define ABORT_ON_FAILURE(expr) \
westonpace marked this conversation as resolved.
Show resolved Hide resolved
do { \
arrow::Status status_ = (expr); \
if (!status_.ok()) { \
std::cerr << status_.message() << std::endl; \
abort(); \
} \
} while (0);

class IgnoringConsumer : public cp::SinkNodeConsumer {
westonpace marked this conversation as resolved.
Show resolved Hide resolved
public:
explicit IgnoringConsumer(size_t tag) : tag_{tag} {}

arrow::Status Consume(cp::ExecBatch batch) override {
// Consume a batch of data
// (just print its row count to stdout)
std::cout << "-" << tag_ << " consumed " << batch.length << " rows" << std::endl;
return arrow::Status::OK();
}

arrow::Future<> Finish() override {
// Signal to the consumer that the last batch has been delivered
// (we don't do any real work in this consumer so mark it finished immediately)
//
// The returned future should only finish when all outstanding tasks have completed
// (after this method is called Consume is guaranteed not to be called again)
std::cout << "-" << tag_ << " finished" << std::endl;
return arrow::Future<>::MakeFinished();
}

private:
// A unique label for instances to help distinguish logging output if a plan has
// multiple sinks
//
// In this example, this is set to the zero-based index of the relation tree in the plan
size_t tag_;
westonpace marked this conversation as resolved.
Show resolved Hide resolved
};

arrow::Future<std::shared_ptr<arrow::Buffer>> GetSubstraitFromServer() {
jvanstraten marked this conversation as resolved.
Show resolved Hide resolved
// Emulate server interaction by parsing hard coded JSON
return eng::internal::SubstraitFromJSON("Plan", R"({
"relations": [
{"rel": {
"read": {
"base_schema": {
"struct": {
"types": [ {"i64": {}}, {"bool": {}} ]
},
"names": ["i", "b"]
},
"filter": {
"selection": {
"directReference": {
"structField": {
"field": 1
}
}
}
},
"local_files": {
"items": [
{
"uri_file": "file:///tmp/dat1.parquet",
"format": "FILE_FORMAT_PARQUET"
},
{
"uri_file": "file:///tmp/dat2.parquet",
"format": "FILE_FORMAT_PARQUET"
}
]
}
}
}}
],
"extension_uris": [
{
"extension_uri_anchor": 7,
"uri": "https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
}
],
"extensions": [
{"extension_type": {
"extension_uri_reference": 7,
"type_anchor": 42,
"name": "null"
}},
{"extension_type_variation": {
"extension_uri_reference": 7,
"type_variation_anchor": 23,
"name": "u8"
}},
{"extension_function": {
"extension_uri_reference": 7,
"function_anchor": 42,
"name": "add"
}}
]
westonpace marked this conversation as resolved.
Show resolved Hide resolved
})");
}

int main(int argc, char** argv) {
// Plans arrive at the consumer serialized in a Buffer, using the binary protobuf
// serialization of a substrait Plan
auto maybe_serialized_plan = GetSubstraitFromServer().result();
ABORT_ON_FAILURE(maybe_serialized_plan.status());
std::shared_ptr<arrow::Buffer> serialized_plan =
std::move(maybe_serialized_plan).ValueOrDie();

// Print the received plan to stdout as JSON
arrow::Result<std::string> maybe_plan_json =
eng::internal::SubstraitToJSON("Plan", *serialized_plan);
ABORT_ON_FAILURE(maybe_plan_json.status());
std::cout << std::string(50, '#') << " received substrait::Plan:" << std::endl;
std::cout << maybe_plan_json.ValueOrDie() << std::endl;

// The data sink(s) for plans is/are implicit in substrait plans, but explicit in
// Arrow. Therefore, deserializing a plan requires a factory for consumers: each
// time the root of a substrait relation tree is deserialized, an Arrow consumer is
// constructed into which its batches will be piped.
std::vector<std::shared_ptr<cp::SinkNodeConsumer>> consumers;
std::function<std::shared_ptr<cp::SinkNodeConsumer>()> consumer_factory = [&] {
// All batches produced by the plan will be fed into IgnoringConsumers:
auto tag = consumers.size();
consumers.emplace_back(new IgnoringConsumer{tag});
return consumers.back();
};

// Deserialize each relation tree in the substrait plan to an Arrow compute Declaration
arrow::Result<std::vector<cp::Declaration>> maybe_decls =
eng::DeserializePlan(*serialized_plan, consumer_factory);
ABORT_ON_FAILURE(maybe_decls.status());
std::vector<cp::Declaration> decls = std::move(maybe_decls).ValueOrDie();

// It's safe to drop the serialized plan; we don't leave references to its memory
serialized_plan.reset();

// Construct an empty plan (note: configure Function registry and ThreadPool here)
arrow::Result<std::shared_ptr<cp::ExecPlan>> maybe_plan = cp::ExecPlan::Make();
ABORT_ON_FAILURE(maybe_plan.status());
std::shared_ptr<cp::ExecPlan> plan = std::move(maybe_plan).ValueOrDie();

// Add decls to plan (note: configure ExecNode registry before this point)
for (const cp::Declaration& decl : decls) {
ABORT_ON_FAILURE(decl.AddToPlan(plan.get()).status());
}

// Validate the plan and print it to stdout
ABORT_ON_FAILURE(plan->Validate());
std::cout << std::string(50, '#') << " produced arrow::ExecPlan:" << std::endl;
std::cout << plan->ToString() << std::endl;

// Start the plan...
std::cout << std::string(50, '#') << " consuming batches:" << std::endl;
ABORT_ON_FAILURE(plan->StartProducing());

// ... and wait for it to finish
ABORT_ON_FAILURE(plan->finished().status());
return EXIT_SUCCESS;
}
4 changes: 4 additions & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -722,6 +722,10 @@ if(ARROW_COMPUTE)
add_subdirectory(compute)
endif()

if(ARROW_ENGINE)
add_subdirectory(engine)
endif()

if(ARROW_CUDA)
add_subdirectory(gpu)
endif()
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/arrow/array/array_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,8 @@ std::string Array::ToString() const {
return ss.str();
}

void PrintTo(const Array& x, std::ostream* os) { *os << x.ToString(); }

Result<std::shared_ptr<Array>> Array::View(
const std::shared_ptr<DataType>& out_type) const {
ARROW_ASSIGN_OR_RAISE(std::shared_ptr<ArrayData> result,
Expand Down
7 changes: 5 additions & 2 deletions cpp/src/arrow/array/array_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,10 +187,11 @@ class ARROW_EXPORT Array {
Status ValidateFull() const;

protected:
Array() : null_bitmap_data_(NULLPTR) {}
Array() = default;
ARROW_DEFAULT_MOVE_AND_ASSIGN(Array);

std::shared_ptr<ArrayData> data_;
const uint8_t* null_bitmap_data_;
const uint8_t* null_bitmap_data_ = NULLPTR;

/// Protected method for constructors
void SetData(const std::shared_ptr<ArrayData>& data) {
Expand All @@ -204,6 +205,8 @@ class ARROW_EXPORT Array {

private:
ARROW_DISALLOW_COPY_AND_ASSIGN(Array);

ARROW_EXPORT friend void PrintTo(const Array& x, std::ostream* os);
};

static inline std::ostream& operator<<(std::ostream& os, const Array& x) {
Expand Down
Loading