Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement Shim inclusion / activation after dist assembly is published #11745

Open
gerashegalov opened this issue Nov 21, 2024 · 2 comments
Labels
feature request New feature or request

Comments

@gerashegalov
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Our dist assembly is complex, maybe ready to be simplified which will be subject of another issue.

At this time, the set of shims participating in the assembly determines whether binary-dedupe.sh may move class files into a shared location or not.

It may change whether a class from sql-plugin-api passes the bitwise-identity test and if not the build fails.

The current practice of configuring the dist differently for the nightly and the release CI pipelines results in this kind of issue. It will allow us to avoid last minute pre-release issues related to the fact that the release version is not the same layout as has been tested for weeks with nightly pipelines.

Describe the solution you'd like
This issue proposes to implement configs to the tune of

sparkXYZ.ShimServiceProvider.enabled

that is honored at run time.

This should allow to make sure that the dist assembly stays constant throughout the entire lifecycle of a Shim including its introduction as a SNAPSHOT shim first before the release.

Describe alternatives you've considered
continue the old way

Additional context

#11744

@gerashegalov gerashegalov added ? - Needs Triage Need team to review and classify feature request New feature or request labels Nov 21, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 26, 2024
@jlowe
Copy link
Contributor

jlowe commented Nov 26, 2024

This seems to be advocating to always include snapshot shims in the artifact but have them be disabled so we can test with the jar layout we will ship. I'm all for testing something closer to what we ship, but I'm not a fan of the proposed approach for the following reasons:

  • Snapshot builds can break at arbitrary points because Spark signatures could change. We don't want the nightly builds breaking and preventing an artifact from being published due to this
  • The release artifact will be bigger than necessary. We already have a large jar artifact.

Instead I would propose a separate artifact for snapshot shims. The main artifact would always be built without snapshot shims. The snapshot shim artifact would be built only with snapshot shims. Regular CI pipelines use the main artifact as usual. CI pipelines for snapshot Spark versions would use the snapshot artifact. No runtime configs need to be added.

@gerashegalov
Copy link
Collaborator Author

The issue is primarily motivated by the shims for DBR like 14.3 and released versions of Spark that are not fully finished. However the user might not care about the features that are still in the works and still interested in trying it out.

gerashegalov added a commit that referenced this issue Jan 23, 2025
Contributes to #10661, implements #11745 for DBR 14.3

- Add switch spark.rapids.shims.spark350db143.enabled

<!--

Thank you for contributing to RAPIDS Accelerator for Apache Spark!

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
   work on top of it, then remove the `[WIP]` label (if present).

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please avoid
rebasing your branch during the review process, as this causes the
context
of any comments made by reviewers to be lost. If conflicts occur during
review then they should be resolved by merging into the branch used for
   making the pull request.

Many thanks in advance for your cooperation!

-->

---------

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants