Improve Analyze Performance and Stability #41930

xuyifangreeneyes · 2023-03-03T15:10:20Z

Enhancement

Currently, when we use the analyze command to collect statistics. There are several problems we have met, especially for large tables:

Analyze is slow. Since analyze needs to scan the full table, it may take hours even days to finish the analyze job for large tables.
Analyze may consume much resource. Some users may increase concurrency(like tidb_build_stats_concurrency and tidb_distsql_scan_concurrency) to speed up analyze. However, that may consume lots of cpu/mem/io for tikv(when scanning the table and sampling) and lots of cpu/mem for tidb(when merging samples and building stats).
When the table has many columns or some columns have large sizes(like text/blob/json type columns), the samples may take up lots of mem. When merging samples and building stats in tidb, tidb may OOM or analyze may be killed by global mem control mechanism. Maybe we can give up collecting statistics for some columns whose stats are barely used such as json columns.
The execution of analyze is not fault-tolerant. If one analyze request to some region fails(maybe due to region unavailable or other reasons), the whole analyze job would fail and we need to rerun the analyze job from the very beginning. It is unfriendly to users.

Here is the related issue in tikv repo:
tikv/tikv#14231

Tasks

Use faster murmur3 hash function for FMSketch calculation

coprocessor: use mur3 to calculate fmsketch tikv/tikv#14204

Reduce encoding cost

Avoid FMSketch calculation for single-column index

Sample-based NDV calculation

The text was updated successfully, but these errors were encountered:

ref #41930

xuyifangreeneyes added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner component/statistics labels Mar 3, 2023

xuyifangreeneyes mentioned this issue Mar 3, 2023

statistics: avoid fmsketch calculation for single-column index #41931

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Mar 6, 2023

statistics: avoid fmsketch calculation for single-column index (#41931)

24ff3f4

ref #41930

xuyifangreeneyes self-assigned this Mar 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Analyze Performance and Stability #41930

Improve Analyze Performance and Stability #41930

xuyifangreeneyes commented Mar 3, 2023 •

edited

Loading

Improve Analyze Performance and Stability #41930

Improve Analyze Performance and Stability #41930

Comments

xuyifangreeneyes commented Mar 3, 2023 • edited Loading

Enhancement

Tasks

Use faster murmur3 hash function for FMSketch calculation

Reduce encoding cost

Avoid FMSketch calculation for single-column index

Sample-based NDV calculation

xuyifangreeneyes commented Mar 3, 2023 •

edited

Loading