Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build against arrow-ballista in CI & remove ballista code from this repo #2582

Merged
merged 7 commits into from
May 22, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 21, 2022

Which issue does this PR close?

Part of #2502

  • The first commit (323f0fb) has the CI changes
  • The second commit (f6f7ba2) fixed a clippy issue
  • The third commit (ea38607) removed all ballista code from this repo
  • The remaining commits are all the things I missed 😵‍💫

Rationale for this change

We need to build against arrow-ballista to detect breaking changes.

What changes are included in this PR?

  • Remove ballista from workspace Cargo.toml
  • Updated datafusion-cli and benchmarks/tpch to remove dependencies on Ballista
  • Add script to pull arrow-ballista, change dependency paths, and build
  • Add GitHub action
  • Add documentation to PR template for building against specific arrow-ballista PR/branch
  • File follow-on issue to discuss improving this process - Improve process for making changes to both arrow-datafusion and arrow-ballista #2583
  • rm -rf ballista*

Are there any user-facing changes?

Yes, CI and process changes.

The PR template has a new section:

# Does this PR break compatibility with Ballista?

<!--
The CI checks will attempt to build [arrow-ballista](https://github.com/apache/arrow-ballista) against this PR. If 
this check fails then it indicates that this PR makes a breaking change to the DataFusion API.

If possible, try to make the change in a way that is not a breaking API change. For example, if code has moved 
 around, try adding `pub use` from the original location to preserve the current API.

If it is not possible to avoid a breaking change (such as when adding enum variants) then follow this process:

- Make a corresponding PR against `arrow-ballista` with the changes required there
- Update `dev/build-arrow-ballista.sh` to clone the appropriate `arrow-ballista` repo & branch
- Merge this PR when CI passes
- Merge the Ballista PR
- Create a new PR here to reset `dev/build-arrow-ballista.sh` to point to `arrow-ballista` master again

_If you would like to help improve this process, please see https://github.com/apache/arrow-datafusion/issues/2583_
-->

This PR originally failed (for good reason) because arrow-ballista was already "broken" by recent changes merged to master here.

warning: unused import: `union_with_alias`
  --> /__w/arrow-datafusion/arrow-datafusion/datafusion/core/src/sql/planner.rs:32:5
   |
32 |     union_with_alias, Column, CreateCatalog, CreateCatalogSchema,
   |     ^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default
error[E0277]: the trait bound `ListingTable: TableSource` is not satisfied
   --> ballista/rust/core/src/serde/logical_plan/mod.rs:255:21
    |
253 |                 LogicalPlanBuilder::scan_with_filters(
    |                 ------------------------------------- required by a bound introduced by this call
254 |                     &scan.table_name,
255 |                     Arc::new(provider),
    |                     ^^^^^^^^^^^^^^^^^^ the trait `TableSource` is not implemented for `ListingTable`
    |

After updating Ballista in apache/datafusion-ballista#31 I reran the build here and it passed.

@andygrove andygrove self-assigned this May 21, 2022
@andygrove andygrove changed the title Build against arrow-ballista in CI WIP: Build against arrow-ballista in CI May 21, 2022
@github-actions github-actions bot added the development-process Related to development process of DataFusion label May 21, 2022
@andygrove andygrove changed the title WIP: Build against arrow-ballista in CI WIP: Build against arrow-ballista in CI May 21, 2022
@andygrove andygrove changed the title WIP: Build against arrow-ballista in CI Build against arrow-ballista in CI May 21, 2022
@andygrove andygrove marked this pull request as ready for review May 21, 2022 15:23
@andygrove
Copy link
Member Author

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 21, 2022
@github-actions github-actions bot added ballista documentation Improvements or additions to documentation labels May 21, 2022
@andygrove andygrove changed the title Build against arrow-ballista in CI Build against arrow-ballista in CI & remove ballista code from this repo May 21, 2022
Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good step forward, I have a fair few breaking changes in flight so I guess we'll get to exercise the process for them soon 😅

Edit: I do wonder if it might be less friction to do what we do with arrow-rs, have ballista track a released DataFusion version and update it as part of the release process for DataFusion. Whilst this does result in a delay getting new features, keeping this delay low is probably of interest to more than just Ballista...

# clone the repo
# TODO make repo/branch configurable
git clone https://github.com/apache/arrow-ballista
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to always pin this so that Ballista changes can't side-effect on unrelated DataFusion PRs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question and I'm really not sure what the best approach is. Perhaps we wait until we hit this situation and then we can pin to the commit prior to the breaking change.

DataFusion no longer depends on Ballista so I am not sure what kind of change would cause us to deal with this situation but I am sure it is possible somehow.

@andygrove
Copy link
Member Author

I think this is a good step forward, I have a fair few breaking changes in flight so I guess we'll get to exercise the process for them soon sweat_smile

Edit: I do wonder if it might be less friction to do what we do with arrow-rs, have ballista track a released DataFusion version and update it as part of the release process for DataFusion. Whilst this does result in a delay getting new features, keeping this delay low is probably of interest to more than just Ballista...

Thanks for the review @tustvold. I also have plenty of breaking API changes planned so this will definitely get tested soon and I am sure we will either iterate on this process or abandon it in favor of the approach you suggest. With more frequent DataFusion releases and the current level of activity in Ballisata, I think that this would likely be fine.

I suggest we try this approach out over the next week or two and see how it goes.

@andygrove andygrove merged commit 907504c into apache:master May 22, 2022
@andygrove andygrove deleted the ballista-ci branch May 22, 2022 14:50
@alamb
Copy link
Contributor

alamb commented May 23, 2022

I do wonder if it might be less friction to do what we do with arrow-rs, have ballista track a released DataFusion version and update it as part of the release process for DataFusion.

I think this is the ideal outcome, however it takes a non trivial effort now to release arrow bi-weekly to keep the code flowing, and and we haven't yet been able to do it with DataFusion. Until we get experience and confidence we'll have regular DataFusion releases, I would recommend we keep developing Ballista directly from a github sha.

However I see this separation (and slowdown of API changes) hopefully a good sign for DataFusion's eventual stabilization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants