-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16510: [R] Add bindings for GCS filesystem #13404
Conversation
strings | ||
strings_internal | ||
symbolize | ||
synchronization | ||
throw_delegate | ||
time | ||
time_zone) | ||
time_zone | ||
wyhash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the cmake changes here seem to just be differences in sorting (due to my locale?), but the omission of wyhash
seems to be legitimately new.
# sed -e 's;.*/absl_;set_property(TARGET absl::;' \ | ||
# -e 's/.pc:Requires:/ PROPERTY INTERFACE_LINK_LIBRARIES /' \ | ||
# -e 's/ = 20210324,//g' \ | ||
# -e 's/ = 20210324//g' \ | ||
# -E -e 's/ = 20[0-9]{6},?//g' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified this regex so that it is robust to future abseil version bumps (at least until 2030!)
@@ -3542,18 +3544,6 @@ macro(resolve_dependency_absl) | |||
APPEND | |||
PROPERTY INTERFACE_LINK_LIBRARIES ${CoreFoundation}) | |||
endif() | |||
set_property(TARGET absl::type_traits PROPERTY INTERFACE_LINK_LIBRARIES absl::config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are duplicated above so I removed them here; I suspect these were an artifact of a bad merge
@@ -44,7 +44,9 @@ get_exported_functions <- function(decorations, export_tag) { | |||
out <- decorations %>% | |||
filter(decoration %in% paste0(export_tag, "::export")) %>% | |||
mutate(functions = map(context, decor:::parse_cpp_function)) %>% | |||
{ vec_cbind(., vec_rbind(!!!pull(., functions))) } %>% | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the rest of the changes in the file seem to be just styler
running on the file (vscode did it on save). Since the code generated after these style changes looks identical (there is no massive diff in the arrowExports.* files), maybe we can remove the styler_exclusion on this file?
I turned on GCS wherever S3 was turned on (so NOT_CRAN=true will build GCS on linux). We should check the build times and see if it's worth the cost or whether it should be a more explicit opt-in. I think we should have it on in the binaries we build though. |
R debian build is failing with an undefined symbol from absl, a different one from what I saw locally (on macOS): https://github.com/apache/arrow/runs/6989838112?check_suite_focus=true#step:5:2990 |
@github-actions crossbow submit homebrew-r-autobrew |
Revision: df5033b Submitted crossbow builds: ursacomputing/crossbow @ actions-b4426052af
|
@coryan any ideas about that undefined abseil symbol mentioned above? Similar issue in the homebrew build |
TL;DR; not much of an idea, sorry.
I assume "above" refers to https://github.com/apache/arrow/runs/6989838112?check_suite_focus=true#step:5:2990 just let me know if I read that wrong.
Seems like Abseil is not being linked? The log says:
That seems to be missing Abseil, but I do not quite understand how the list of bundled dependencies is created.
Probably a similar problem as above. Less likely, it could be Abseil's sensitivity to C++11 vs. C++17 building (see abseil/abseil-cpp#696), I always think it could be that issue if |
@coryan thanks, that's helpful actually. I'll have to dig more on the bundled libraries thing. On the Homebrew build, abseil indeed is being built with C++17 by brew (https://github.com/Homebrew/homebrew-core/blob/master/Formula/abseil.rb#L31), and the arrow build is using C++11. What's odd is that we've been depending on it (transitively) for a while because Flight uses grpc (also built C++17 by brew), but I guess google-cloud-cpp hits different parts. |
Well, that didn't work, there must be some other piece I'm missing:
|
I suspect |
If you have a VM with abseil installed, you could do something like: pkg-config --libs absl_memory absl_strings absl_str_format absl_time absl_variant absl_base absl_memory absl_optional absl_span absl_time absl_variant
-L/usr/local/lib64 -labsl_str_format_internal -labsl_bad_optional_access -labsl_time -labsl_civil_time -labsl_strings -labsl_strings_internal -lrt -labsl_base -labsl_spinlock_wait -labsl_int128 -labsl_throw_delegate -labsl_time_zone -labsl_bad_variant_access -labsl_raw_logging_internal -labsl_log_severity Which (a) removes the header-only libraries, and (b) saves you some whack-a-moling (maybe). |
Yeah I keep thinking I've reached the end (just one more library!) but that would be smarter. I have a local build of abseil already (from the arrow build), didn't think to point pkg-config at it. Thanks for the suggestion! |
@github-actions crossbow submit homebrew-r-autobrew |
Revision: 4d789f0 Submitted crossbow builds: ursacomputing/crossbow @ actions-f35855420a
|
For the R Windows packages, I need to use a static libcurl, and stack overflow tells me that I need to define cc @jeroen in case you have different ideas |
I think
|
(I created docs follow up: https://issues.apache.org/jira/browse/ARROW-16887) |
r/configure.win
Outdated
if [ $(cmake_option ARROW_GCS) -eq 1 ]; then | ||
PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_GCS -DCURL_STATICLIB" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to let pkg-config handle adding the flags and libraries for libcurl:
if [ $(cmake_option ARROW_GCS) -eq 1 ]; then | |
PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_GCS -DCURL_STATICLIB" | |
fi | |
if [ $(cmake_option ARROW_GCS) -eq 1 ]; then | |
PKG_CFLAGS="$PKG_CFLAGS -DARROW_R_WITH_GCS" | |
PKG_CONFIG_PACKAGES="$PKG_CONFIG_PACKAGES libcurl" | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind this might not help if libcurl is already statically linked in the arrow binaries.
@github-actions crossbow submit -g r |
Revision: ee4e8e6 Submitted crossbow builds: ursacomputing/crossbow @ actions-fdfd53e5a1 |
We need a preparation to use boost/process.hpp with Mingw-w64 like https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3_test_util.cc#L24-L38 . We can handle it in a follow-up task. |
@github-actions crossbow submit test-r-offline-maximal test-r-depsource-system homebrew-r-brew |
Revision: 226c31b Submitted crossbow builds: ursacomputing/crossbow @ actions-5668d3fe2c
|
I made ARROW-16906 for that and added a TODO comment in the workflow. |
@github-actions crossbow submit homebrew-r-brew |
Revision: a2a5e87 Submitted crossbow builds: ursacomputing/crossbow @ actions-06d32e61d7
|
Ok @kou we've resolved all of the builds now, this is ready if you want to give it a final review/+1. |
One observation: the fully bundled Linux build ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
add_library(google-cloud-cpp::storage STATIC IMPORTED) | ||
set_target_properties(google-cloud-cpp::storage | ||
PROPERTIES IMPORTED_LOCATION | ||
"${GOOGLE_CLOUD_CPP_STATIC_LIBRARY_STORAGE}" | ||
INTERFACE_INCLUDE_DIRECTORIES | ||
"${GOOGLE_CLOUD_CPP_INCLUDE_DIR}") | ||
# Update this from https://github.com/googleapis/google-cloud-cpp/blob/main/google/cloud/storage/google_cloud_cpp_storage.cmake | ||
set_property(TARGET google-cloud-cpp::storage | ||
PROPERTY INTERFACE_LINK_LIBRARIES | ||
google-cloud-cpp::common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this because google-cloud-cpp-rest-internal
depends on google-cloud-cpp::common
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, but since it's copied from upstream, I'd rather leave it there for completeness unless it's harming anything.
|
||
arrow_built_with() { | ||
# Function to check cmake options for features | ||
grep -i 'set('"$1"' "ON")' $ARROW_OPTS_CMAKE >/dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: We can use grep -q ...
instead of grep ... >/dev/null 2>&1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep -q
doesn't suppress stderr though, and I want this to be completely silent
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
This adds basic bindings for GcsFileSystem to R, turns it on in the macOS, Windows, and Linux packaging (same handling as ARROW_S3), and basic R tests. Followups: - Bindings for FromImpersonatedServiceAccount (ARROW-16885) - Set up testbench for fuller tests, like how we do with minio (ARROW-16879) - GcsFileSystem::Make should return Result (ARROW-16884) - Explore auth integration/compatibility with `gargle`, `googleAuthR`, etc.: can we pick up the same credentials they use (ARROW-16880) - macOS binary packaging: push dependencies upstream (ARROW-16883) - Windows binary packaging: push dependencies upstream (ARROW-16878) - Update cloud/filesystem documentation (ARROW-16887) Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
This adds basic bindings for GcsFileSystem to R, turns it on in the macOS, Windows, and Linux packaging (same handling as ARROW_S3), and basic R tests. Followups: - Bindings for FromImpersonatedServiceAccount (ARROW-16885) - Set up testbench for fuller tests, like how we do with minio (ARROW-16879) - GcsFileSystem::Make should return Result (ARROW-16884) - Explore auth integration/compatibility with `gargle`, `googleAuthR`, etc.: can we pick up the same credentials they use (ARROW-16880) - macOS binary packaging: push dependencies upstream (ARROW-16883) - Windows binary packaging: push dependencies upstream (ARROW-16878) - Update cloud/filesystem documentation (ARROW-16887) Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
This adds basic bindings for GcsFileSystem to R, turns it on in the macOS, Windows, and Linux packaging (same handling as ARROW_S3), and basic R tests.
Followups:
gargle
,googleAuthR
, etc.: can we pick up the same credentials they use (ARROW-16880)