Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [DB 14.3] numRecords stat in Delta Lake tables is not set on GPU #12047

Closed
razajafri opened this issue Jan 30, 2025 · 0 comments · Fixed by #12196
Closed

[BUG] [DB 14.3] numRecords stat in Delta Lake tables is not set on GPU #12047

razajafri opened this issue Jan 30, 2025 · 0 comments · Fixed by #12196
Assignees
Labels
bug Something isn't working

Comments

@razajafri
Copy link
Collaborator

Describe the bug
When running the test delta_lake_update_test.py::test_delta_update_dataframe_api it fails with the following exception

E               AssertionError: Delta log 00000000000000000002.json 'remove' keys mismatch:
E               CPU: {'path': 'part-00000-52012349-499d-46fa-bec2-86be7b58c965-c000.snappy.parquet', 'deletionTimestamp': 1738197316431, 'dataChange': True, 'extendedFileMetadata': True, 'partitionValues': {}, 'size': 97114, 'tags': {'OPTIMIZE_TARGET_SIZE': '268435456'}, 'stats': '{"numRecords":2048}'}
E               GPU: {'path': 'part-00000-e1fb51cc-00b6-4f0a-9ba6-a57828a91906-c000.snappy.parquet', 'deletionTimestamp': 1738197322059, 'dataChange': True, 'extendedFileMetadata': True, 'partitionValues': {}, 'size': 97114, 'tags': {'OPTIMIZE_TARGET_SIZE': '268435456'}}

Steps/Code to reproduce bug
Build spark-rapids for Databricks 14.3 and run the command TEST_PARALLEL=0 TEST_MODE=DELTA_LAKE_ONLY WITH_DEFAULT_UPSTREAM_SHIM=0 TESTS=delta_lake_update_test.py::test_delta_update_dataframe_api ./jenkins/databricks/test.sh

Expected behavior
The test should pass

Additional context
I think we need to update the GpuUpdateCommand.scala for spark350db143 with the new stats

@razajafri razajafri added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 30, 2025
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 4, 2025
@razajafri razajafri changed the title [BUG] Update test failing due to delta log stats not being written on GPU for Databricks 14.3 [BUG] [DB 14.3] numRecords stat in Delta Lake tables is not set on GPU Feb 6, 2025
@razajafri razajafri self-assigned this Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants