Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35497: [C++] Use the latest tagged version of flatbuffers #38192

Merged
merged 5 commits into from
Oct 12, 2023

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Oct 11, 2023

Rationale for this change

To use a more modern version of flatc in Arrow.

What changes are included in this PR?

  1. Re-generating the the C++ files with a flatc based on the latest tag on the flatbuffer repo.
  2. Copy the minimal set of includes to our vendored folder
  3. Manually re-apply the patches that have been applied to the previous headers to fix issues

Are these changes tested?

Tested by building everything correctly.

I used this hacky script to do it.

      #!/bin/sh

      SRC="${FLATBUFFERS_REPO}/include/flatbuffers"
      DEST="${ARROW_REPO}/cpp/thirdparty/flatbuffers/include/flatbuffers/"

      cp "${SRC}/allocator.h" "${DEST}"
      cp "${SRC}/array.h" "${DEST}"
      cp "${SRC}/base.h" "${DEST}"
      cp "${SRC}/buffer.h" "${DEST}"
      cp "${SRC}/buffer_ref.h" "${DEST}"
      cp "${SRC}/default_allocator.h" "${DEST}"
      cp "${SRC}/detached_buffer.h" "${DEST}"
      cp "${SRC}/flatbuffer_builder.h" "${DEST}"
      cp "${SRC}/flatbuffers.h" "${DEST}"
      cp "${SRC}/stl_emulation.h" "${DEST}"
      cp "${SRC}/string.h" "${DEST}"
      cp "${SRC}/struct.h" "${DEST}"
      cp "${SRC}/table.h" "${DEST}"
      cp "${SRC}/vector.h" "${DEST}"
      cp "${SRC}/vector_downward.h" "${DEST}"
      cp "${SRC}/verifier.h" "${DEST}"

// Move this vendored copy of flatbuffers to a private namespace,
// but continue to access it through the "flatbuffers" alias.
namespace arrow_vendored_private {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we switch to C++17-style arrow_vendored_private::flatbuffers it probably reduces the patch size and may make automation more doable as well.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 11, 2023
- #endif
- #endif
- #endif // __has_include
+ #include <string_view>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change actually necessary or would the original "Check for std::string_view (in c++17)" snippet above work for us? We probably want to minimize the patch size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Since this version doesn't check for absl::string_view first, we don't need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference: #12204

@felipecrv felipecrv requested a review from pitrou October 11, 2023 14:57
@pitrou
Copy link
Member

pitrou commented Oct 12, 2023

@github-actions crossbow submit -g cpp -g python

@github-actions
Copy link

Revision: b019254

Submitted crossbow builds: ursacomputing/crossbow @ actions-2c6d2bba81

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-conda-python-3.10 Github Actions
test-conda-python-3.10-cython2 Github Actions
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-v3.5.0 Github Actions
test-conda-python-3.10-substrait Github Actions
test-conda-python-3.11 Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-hypothesis Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.11-spark-master Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.5.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-cuda-cpp Github Actions
test-cuda-python Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-debian-11-python-3 Azure
test-fedora-35-cpp Github Actions
test-fedora-35-python-3 Azure
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-22.04-cpp-20 Github Actions
test-ubuntu-22.04-cpp-no-threading Github Actions
test-ubuntu-22.04-python-3 Github Actions

@pitrou
Copy link
Member

pitrou commented Oct 12, 2023

CI failures look unrelated, I'll merge. Thanks @felipecrv !

@pitrou pitrou merged commit 01b42d5 into apache:main Oct 12, 2023
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Oct 12, 2023
@felipecrv felipecrv deleted the pin_fbs branch October 12, 2023 17:04
llama90 pushed a commit to llama90/arrow that referenced this pull request Oct 12, 2023
…pache#38192)

### Rationale for this change

To use a more modern version of `flatc` in Arrow.

### What changes are included in this PR?

1) Re-generating the the C++ files with a `flatc` based on the latest tag on the `flatbuffer` repo.
2) Copy the minimal set of includes to our vendored folder
3) Manually re-apply the patches that have been applied to the previous headers to fix issues

### Are these changes tested?

Tested by building everything correctly.
* Closes: apache#35497

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 01b42d5.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

@kou
Copy link
Member

kou commented Oct 20, 2023

@github-actions crossbow submit test-skyhook-integration

@github-actions
Copy link

Revision: b019254

Submitted crossbow builds: ursacomputing/crossbow @ actions-2eb8774cee

Task Status
test-skyhook-integration Github Actions

@kou
Copy link
Member

kou commented Oct 20, 2023

@felipecrv It seems that this broke test-skyhook-integration:

https://github.com/ursacomputing/crossbow/actions/runs/6502407264/job/17661381250#step:6:2369

FAILED: src/skyhook/CMakeFiles/arrow_skyhook_objlib.dir/protocol/skyhook_protocol.cc.o 
/usr/local/bin/sccache /usr/bin/c++  -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_MIMALLOC -DARROW_NO_DEPRECATED_API -DARROW_WITH_RE2 -DARROW_WITH_TIMING_TESTS -DARROW_WITH_UTF8PROC -DBOOST_ALL_NO_LIB -Isrc -I/arrow/cpp/src -I/arrow/cpp/src/generated -isystem /arrow/cpp/thirdparty/flatbuffers/include -isystem /arrow/cpp/thirdparty/hadoop/include -isystem google_cloud_cpp_ep-install/include -isystem absl_ep-install/include -isystem crc32c_ep-install/include -isystem orc_ep-install/include -isystem protobuf_ep-install/include -isystem awssdk_ep-install/include -isystem opentelemetry_ep-install/include -isystem xsimd_ep/src/xsimd_ep-install/include -isystem jemalloc_ep-prefix/src -isystem mimalloc_ep/src/mimalloc_ep/include/mimalloc-2.0 -Wno-noexcept-type  -fdiagnostics-color=always  -Wall -Wno-conversion -Wno-sign-conversion -Wunused-result -Wdate-time -fno-semantic-interposition -msse4.2  -g -Werror -O0 -ggdb -g1 -fPIC   -pthread -std=c++17 -MD -MT src/skyhook/CMakeFiles/arrow_skyhook_objlib.dir/protocol/skyhook_protocol.cc.o -MF src/skyhook/CMakeFiles/arrow_skyhook_objlib.dir/protocol/skyhook_protocol.cc.o.d -o src/skyhook/CMakeFiles/arrow_skyhook_objlib.dir/protocol/skyhook_protocol.cc.o -c /arrow/cpp/src/skyhook/protocol/skyhook_protocol.cc
In file included from /arrow/cpp/src/skyhook/protocol/skyhook_protocol.cc:21:
/arrow/cpp/src/skyhook/protocol/ScanRequest_generated.h: In member function 'bool org::apache::arrow::flatbuf::ScanRequest::Verify(arrow_vendored_private::flatbuffers::Verifier&) const':
/arrow/cpp/src/skyhook/protocol/ScanRequest_generated.h:45:55: error: no matching function for call to 'org::apache::arrow::flatbuf::ScanRequest::VerifyField<int64_t>(arrow_vendored_private::flatbuffers::Verifier&, org::apache::arrow::flatbuf::ScanRequest::FlatBuffersVTableOffset) const'
   45 |            VerifyField<int64_t>(verifier, VT_FILE_SIZE) &&
      |                                                       ^
In file included from /arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/flatbuffer_builder.h:42,
                 from /arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/flatbuffers.h:35,
                 from /arrow/cpp/src/skyhook/protocol/skyhook_protocol.cc:19:
/arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/table.h:132:8: note: candidate: 'bool arrow_vendored_private::flatbuffers::Table::VerifyField(const arrow_vendored_private::flatbuffers::Verifier&, arrow_vendored_private::flatbuffers::voffset_t, size_t) const [with T = long int; arrow_vendored_private::flatbuffers::voffset_t = short unsigned int; size_t = long unsigned int]'
  132 |   bool VerifyField(const Verifier &verifier, voffset_t field,
      |        ^~~~~~~~~~~
/arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/table.h:132:8: note:   candidate expects 3 arguments, 2 provided
In file included from /arrow/cpp/src/skyhook/protocol/skyhook_protocol.cc:21:
/arrow/cpp/src/skyhook/protocol/ScanRequest_generated.h:46:57: error: no matching function for call to 'org::apache::arrow::flatbuf::ScanRequest::VerifyField<int16_t>(arrow_vendored_private::flatbuffers::Verifier&, org::apache::arrow::flatbuf::ScanRequest::FlatBuffersVTableOffset) const'
   46 |            VerifyField<int16_t>(verifier, VT_FILE_FORMAT) &&
      |                                                         ^
In file included from /arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/flatbuffer_builder.h:42,
                 from /arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/flatbuffers.h:35,
                 from /arrow/cpp/src/skyhook/protocol/skyhook_protocol.cc:19:
/arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/table.h:132:8: note: candidate: 'bool arrow_vendored_private::flatbuffers::Table::VerifyField(const arrow_vendored_private::flatbuffers::Verifier&, arrow_vendored_private::flatbuffers::voffset_t, size_t) const [with T = short int; arrow_vendored_private::flatbuffers::voffset_t = short unsigned int; size_t = long unsigned int]'
  132 |   bool VerifyField(const Verifier &verifier, voffset_t field,
      |        ^~~~~~~~~~~
/arrow/cpp/thirdparty/flatbuffers/include/flatbuffers/table.h:132:8: note:   candidate expects 3 arguments, 2 provided

Could you re-generate cpp/src/skyhook/protocol/ScanRequest_generated.h too?

@felipecrv
Copy link
Contributor Author

Could you re-generate cpp/src/skyhook/protocol/ScanRequest_generated.h too?

Sure. Let me create an issue and submit a PR soon.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…pache#38192)

### Rationale for this change

To use a more modern version of `flatc` in Arrow.

### What changes are included in this PR?

1) Re-generating the the C++ files with a `flatc` based on the latest tag on the `flatbuffer` repo.
2) Copy the minimal set of includes to our vendored folder
3) Manually re-apply the patches that have been applied to the previous headers to fix issues

### Are these changes tested?

Tested by building everything correctly.
* Closes: apache#35497

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…pache#38192)

### Rationale for this change

To use a more modern version of `flatc` in Arrow.

### What changes are included in this PR?

1) Re-generating the the C++ files with a `flatc` based on the latest tag on the `flatbuffer` repo.
2) Copy the minimal set of includes to our vendored folder
3) Manually re-apply the patches that have been applied to the previous headers to fix issues

### Are these changes tested?

Tested by building everything correctly.
* Closes: apache#35497

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…pache#38192)

### Rationale for this change

To use a more modern version of `flatc` in Arrow.

### What changes are included in this PR?

1) Re-generating the the C++ files with a `flatc` based on the latest tag on the `flatbuffer` repo.
2) Copy the minimal set of includes to our vendored folder
3) Manually re-apply the patches that have been applied to the previous headers to fix issues

### Are these changes tested?

Tested by building everything correctly.
* Closes: apache#35497

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Pin flatbuffers compiler version
3 participants