Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dynamic Partition Pruning tests on Databricks 14.3 #11536

Closed
razajafri opened this issue Sep 27, 2024 · 2 comments · Fixed by #11768
Closed

Fix Dynamic Partition Pruning tests on Databricks 14.3 #11536

razajafri opened this issue Sep 27, 2024 · 2 comments · Fixed by #11768
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

razajafri commented Sep 27, 2024

Build the plugin against the Databricks 14.3 cluster using #11467. Once built successfully run the DPP tests by TESTS=dpp_test.py jenkins/databricks/test.sh

The following tests fail

[gw4] [  8%] FAILED ../../src/main/python/dpp_test.py::test_dpp_reuse_broadcast_exchange
[gw4] [  9%] FAILED ../../src/main/python/dpp_test.py::test_dpp_reuse_broadcast_exchange
[gw4] [  9%] FAILED ../../src/main/python/dpp_test.py::test_dpp_reuse_broadcast_exchange_cpu_scan
[gw4] [ 10%] FAILED ../../src/main/python/dpp_test.py::test_dpp_via_aggregate_subquery
[gw4] [ 12%] FAILED ../../src/main/python/dpp_test.py::test_dpp_empty_relation
[gw4] [ 12%] FAILED ../../src/main/python/dpp_test.py::test_dpp_from_swizzled_hash_keys
[gw4] [ 12%] FAILED ../../src/main/python/dpp_test.py::test_dpp_like_any
@mythrocks mythrocks self-assigned this Oct 29, 2024
@mythrocks
Copy link
Collaborator

Taking this one, next.

@mythrocks
Copy link
Collaborator

This one is looking hard to solve: There are nigh on opaque assertion-failures, deep in Databricks code. I might have to circle back to those.

@sameerz sameerz added the bug Something isn't working label Nov 16, 2024
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Nov 26, 2024
Fixes NVIDIA#11536.

This commit fixes the tests in `dpp_test.py` that were failing on
Databricks 14.3.

The failures were largely a result of an erroneous shim implementation,
that was fixed as part of NVIDIA#11750.

This commit accounts for the remaining failures that result from there
being a `CollectLimitExec` in certain DPP query plans (that include
broadcast joins, for example).  The tests have been made more
permissive, in allowing the `CollectLimitExec` to run on the CPU.

The `CollectLimitExec` based plans will be further explored as part of
NVIDIA#11764.

Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants