Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39147: [R] Add Bootstrap.r #39148

Merged
merged 5 commits into from
Mar 21, 2024
Merged

GH-39147: [R] Add Bootstrap.r #39148

merged 5 commits into from
Mar 21, 2024

Conversation

assignUser
Copy link
Member

@assignUser assignUser commented Dec 8, 2023

Rationale for this change

Currently installing via github is not possible because the vendoring of the cpp source isn't happening. We can use bootstrap.R to do it. (Does not work for devtools::install_local() as that copies the source folder before running bootstrap.R)

What changes are included in this PR?

An R'ified version of make sync-cpp

Are these changes tested?

locally

Are there any user-facing changes?

Yes, installing via devtools::install_github() will now properly vendor the cpp files.

Copy link

github-actions bot commented Dec 8, 2023

⚠️ GitHub issue #39147 has been automatically assigned in GitHub to PR creator.

@assignUser
Copy link
Member Author

@github-actions crossbow submit r-binary-packages

Copy link

github-actions bot commented Dec 9, 2023

Revision: b9c839c

Submitted crossbow builds: ursacomputing/crossbow @ actions-c75677e771

Task Status
r-binary-packages GitHub Actions

@assignUser
Copy link
Member Author

I have removed devtools::build from r-binary-packages again, installing the dependencies just takes to long. It also makes more sense to add a specific 'prepare release' job, if we create more automation (e.g. cherry picking things) we can test that there as well.

Copy link
Member

@amoeba amoeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I left some notes, including some about making this work on macOS. Since I know there are a few maintainers on macOS and likely to be some on a continuous basis, I think it'd be nice to address them.

After manually editing bootstrap.R and update-checksum.R to make them work on macOS, bootstrap.R ran successfully for me.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 9, 2023
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give this a closer review on Monday; however, at a very high level, I think that bootstrap.R should only be concerned with copying (and trimming) the cpp sources from elsewhere in the repository. I think currently that happens via sync-cpp. The makefile also does other things before building...those things should, I think, stay in the makefile. In other words, bootstrap.R will get run on users' machines so it must have a more restricted scope...the makefile commands will only get run on developer machines.

For completeness...the purpose of the bootstrap.R is to enable developer tools that use pkgbuild::build() to "just work". These include:

  • devtools::submit_cran(), devtools::check_win_devel(), etc.
  • R CMD check in the RStudio IDE
  • pak::pak() (this worked already by building a tarball without C++ sources and relying on nightly static lib builds; however, with bootstrap.R you could change C++ code and have somebody install the pull request version of soemething to see if it fixed an issue)
  • Stock r-lib/actions workflows. This might mean that we can simplify/unify some of our actions-based workflows (it certainly did for nanoarrow and adbc/r)

In ADBC and nanoarrow I also run Rscript bootstrap.R in configure to catch the case where somebody does R CMD INSTALL ., but I don't think we need that here.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 9, 2023
@assignUser
Copy link
Member Author

In other words, bootstrap.R will get run on users' machines so it must have a more restricted scope...the makefile commands will only get run on developer machines.

At the moment if run on a dev version it will skip everything and the release tarball won't have the file so a cran source install via pak or so will also not run it. We could make everything outside of make sync-cpp explicitly opt in but why install a release version from gh instead of CRAN?

with bootstrap.R you could change C++ code and have somebody install the pull request version of soemething to see if it fixed an issue)

Ah yes that sounds useful, so we want to change it to always vendor the cpp source (which also means that we probably want to use something other than rsync to do it?) and only do the other release stuff on release version/opt-in.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 9, 2023
@assignUser
Copy link
Member Author

assignUser commented Dec 9, 2023

devtools::install_github('assignUser/arrow@bootstrap.r') with LIBARROW_BINARY=false set correctly vendors cpp and builds from source now:

Running bootstrap.R...
ℹ Vendoring C++ sources
cp ../NOTICE.txt inst/NOTICE.txt
rsync --archive --delete --exclude 'apidoc' --exclude 'build' --exclude 'build-support/boost_*' --exclude 'examples' --exclude 'src/gandiva' --exclude 'src/jni' --exclude 'src/skyhook' --exclude 'submodules' --exclude '**/*_test.cc' ../cpp tools/
cp -p ../.env tools/dotenv
cp -p ../NOTICE.txt tools/
cp -p ../LICENSE.txt tools/
sed -i"" -e "s/\.env/dotenv/g" tools/cpp/CMakeLists.txt
! Skipping releases preparations for dev version.
── R CMD build ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file ‘/tmp/Rtmp2g1C7O/remotes1d33ded19da00/assignUser-arrow-25f733d/r/DESCRIPTION’ ...
─  preparing ‘arrow’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  running ‘cleanup’
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘arrow_14.0.1.9000.tar.gz’

Installing package into '/home/jwj/R/x86_64-pc-linux-gnu-library/4.3'
(as 'lib' is unspecified)
* installing *source* package 'arrow' ...
** using staged installation
*** pkg-config found.
*** Latest available nightly for 14.0.1.9000: 14.0.1.100000226
*** Found local C++ source: 'tools/cpp'
*** Building libarrow from source

I'll look at converting the rsync invocation to R next week, also: #39154

@paleolimbot
Copy link
Member

Ah yes that sounds useful, so we want to change it to always vendor the cpp source

Yes! bootstrap.R is the only file where this can happen that's guaranteed to run during pkgbuild::build(). Moving that logic there is the only way to ensure that installation from developer/package install tools that use pkgbuild::build() will get the same tarball as the one we would release to CRAN.

only do the other release stuff on release version/opt-in.

I would say that automating all those steps is a good thing, but bootstrap.R is not the file to do it. These steps aren't things that you want to fail during package installation via pak::pak(), for example...you really just want to do them on a release branch and then commit the result.

why install a release version from gh instead of CRAN?

This mostly matters before we have released it to CRAN (allows, for example, that I or somebody could quickly install the about-to-be-released tarball to ensure that it fixed some problem).

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

I really like the idea of automating these pre-release checks; however, I think bootstrap.R is not the right place for them (for the reasons I noted above). I would propose:

  • If you anticipate using the Rified release steps, move the existing bootstrap.R that you have here into tools/experimental_release_automation.R. After the 15.0.0 release process would be a good time to evaluate its utility.
  • Rewrite the sync-cpp in pure R and put it in bootstrap.R

I do wonder if having both the makefile and the Rified release steps are both useful or if it's a little confusing to have them both. You are the primary use of both of them, so I will leave it up to you to decide what the best division of responsibility is.

@github-actions github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes awaiting committer review Awaiting committer review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting changes Awaiting changes labels Dec 11, 2023
@assignUser
Copy link
Member Author

I do wonder if having both the makefile and the Rified release steps are both useful or if it's a little confusing to have them both

Moving everything outside of sync-cpp into a separate release script allows us to keep everything else in the make file and keep only that part duplicated. We can of course also transition to using the bootstrap.R version exclusivly after giving it some time to prove itself?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 12, 2023
@paleolimbot
Copy link
Member

We can of course also transition to using the bootstrap.R version exclusivly after giving it some time to prove itself?

I think that's a good idea...continue to use R CMD build and slowly transition to devtools::build() in CI and then for actual releases.

My high-level comment of having bash calling R calling bash and/or R remains...I think it's find to have bash/Makefile OR R automate some of the release things, but those should all be in the same place (and avoid too many back-and-forth indirections between the two).

@assignUser
Copy link
Member Author

@paleolimbot as discussed last year (wow!?) I have refactored this to just be the R version of make sync-cpp. I have compared the list of copied files and this only misses .gitignore files (using all.files=T also doesn't fix that?).

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits...thank you! The fact that it does not run in configure is particularly helpful (isolates the place where it runs to R CMD check in RStudio, and devtools::build(), where you probably do want this to happen.

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting merge Awaiting merge labels Mar 21, 2024
assignUser and others added 3 commits March 21, 2024 18:25
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
@assignUser assignUser merged commit dc20d95 into apache:main Mar 21, 2024
9 checks passed
@assignUser assignUser removed the awaiting committer review Awaiting committer review label Mar 21, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit dc20d95.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details.

@jonkeane
Copy link
Member

Sorry this is post merge, I just saw it now while tracking something else down.

I see that there was already some discussion about how this is duplicative of the sync-cpp step in the make file and it seemed everyone agreed that having this in two places wasn't ideal: but is there a ticket to clean that up? I don't think I see anything linked here, but I might have missed something

@assignUser
Copy link
Member Author

assignUser commented Mar 26, 2024

Good catch @jonkeane! There is now #40787 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Add bootstrap.R
4 participants