Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hive ANALYZE when row count stats are missing #18828

Merged
merged 1 commit into from
Aug 30, 2023

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Aug 28, 2023

Description

Fixes #18798

Additional context and related issues

The problem is specific to hive metastore and newer versions of HMS have fixed this https://issues.apache.org/jira/browse/HIVE-19254

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive
* Fix failure in ANALYZE when row count stats are missing. ({issue}`18798`)

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Hive product test which reproduces the problem before your fix ?
ANALYZE TABLE Table1 PARTITION(ds='2008-04-09', hr) COMPUTE STATISTICS NOSCAN; can be used in Hive to collect fileCount and onDiskDataSizeInBytes statistics without getting row count statistics

@Dith3r
Copy link
Member Author

Dith3r commented Aug 28, 2023

Using ANALYZE TABLE Table1 PARTITION(...) COMPUTE STATISTICS NOSCAN; and call to CALL hive.system.sync_partition_metadata does not trigger this issue.

@raunaqmorarka
Copy link
Member

Using ANALYZE TABLE Table1 PARTITION(...) COMPUTE STATISTICS NOSCAN; and call to CALL hive.system.sync_partition_metadata does not trigger this issue.

Okay, were you able to manually reproduce the problem and verify the fix ?

@Dith3r Dith3r changed the title Preserve thrift metastore fast stats Fix hive ANALYZE when row count stats are missing Aug 29, 2023
@Dith3r
Copy link
Member Author

Dith3r commented Aug 29, 2023

There is even fix for Hive for this issue (https://issues.apache.org/jira/browse/HIVE-19254). I can test it with some old version of hive.

@raunaqmorarka raunaqmorarka merged commit 9a5838a into trinodb:master Aug 30, 2023
@github-actions github-actions bot added this to the 426 milestone Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

ANALYZE broken since 420 for Hive tables without (full) statistics
2 participants