Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

foreyes · 2024-04-10T10:08:40Z

Introduces the 'gp_random_insert_segments' GUC to reduce the generation of excessive fragmented files during the insertion of small amounts of data into clusters with a large number of segments (e.g., 1000 records into 100 segments).

Fragmented data insertion can significantly degrade performance, especially when using append-optimized or cloud-based storage. By introducing the 'gp_random_insert_segments' GUC, users can limit the number of segments used for data insertion in randomly distributed tables, which can significantly reduce fragmented files.

src/backend/utils/misc/guc_gp.c

…random distributed table insertion Introduces the 'gp_random_insert_segments' GUC to reduce the generation of excessive fragmented files during the insertion of small amounts of data into clusters with a large number of segments (e.g., 1000 records into 100 segments). Fragmented data insertion can significantly degrade performance, especially when using append-optimized or cloud-based storage. By introducing the 'gp_random_insert_segments' GUC, users can limit the number of segments used for data insertion in randomly distributed tables, which can significantly reduce fragmented files.

foreyes · 2024-04-17T02:36:06Z

Update

disable INSERT with ORCA when this feature is used.
directly change the segments in slice table to avoid dispatch to unrelated segments.

my-ship-it · 2024-04-18T01:34:48Z

Please add test case?

my-ship-it

LGTM

…random distributed table insertion (apache#406) Introduces the 'gp_random_insert_segments' GUC to reduce the generation of excessive fragmented files during the insertion of small amounts of data into clusters with a large number of segments (e.g., 1000 records into 100 segments). Fragmented data insertion can significantly degrade performance, especially when using append-optimized or cloud-based storage. By introducing the 'gp_random_insert_segments' GUC, users can limit the number of segments used for data insertion in randomly distributed tables, which can significantly reduce fragmented files.

foreyes self-assigned this Apr 10, 2024

foreyes force-pushed the dev/limited_insert branch 4 times, most recently from 519d20c to 50efffc Compare April 11, 2024 00:18

my-ship-it reviewed Apr 11, 2024

View reviewed changes

src/backend/utils/misc/guc_gp.c Show resolved Hide resolved

gfphoenix78 reviewed Apr 12, 2024

View reviewed changes

src/backend/utils/misc/guc_gp.c Show resolved Hide resolved

foreyes force-pushed the dev/limited_insert branch from 50efffc to 690b060 Compare April 17, 2024 02:29

foreyes requested a review from my-ship-it April 17, 2024 06:24

my-ship-it approved these changes Apr 18, 2024

View reviewed changes

my-ship-it merged commit 143b3df into apache:main Apr 18, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

foreyes commented Apr 10, 2024 •

edited

Loading

foreyes commented Apr 17, 2024

my-ship-it commented Apr 18, 2024

my-ship-it left a comment

Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

Conversation

foreyes commented Apr 10, 2024 • edited Loading

foreyes commented Apr 17, 2024

Update

my-ship-it commented Apr 18, 2024

my-ship-it left a comment

Choose a reason for hiding this comment

foreyes commented Apr 10, 2024 •

edited

Loading