Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](move-memtable) fix move memtable core when use multi table load #35458

Merged
merged 2 commits into from
Jun 5, 2024
Merged

[fix](move-memtable) fix move memtable core when use multi table load #35458

merged 2 commits into from
Jun 5, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented May 27, 2024

Proposed changes

Move memtable core when use multi table load:

0x51f000c73860 is located 3040 bytes inside of 3456-byte region [0x51f000c72c80,0x51f000c73a00)
freed by thread T4867 (FragmentMgrThre) here:
    #0 0x558f6ad7f43d in operator delete(void*) (/mnt/hdd01/STRESS_ENV/be/lib/doris_be+0x22eec43d) (BuildId: b46f73d1f76dfcd6)
    #1 0x558f6e6cea2c in std::__new_allocator<doris::PTabletID>::deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/new_allocator.h:168:2
    #2 0x558f6e6ce9e7 in std::allocator<doris::PTabletID>::deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/allocator.h:210:25
    #3 0x558f6e6ce9e7 in std::allocator_traits<std::allocator<doris::PTabletID>>::deallocate(std::allocator<doris::PTabletID>&, doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/alloc_traits.h:516:13
    #4 0x558f6e6ce9e7 in std::_Vector_base<doris::PTabletID, std::allocator<doris::PTabletID>>::_M_deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:387:4
    #5 0x558f6e6d0780 in void std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>::_M_range_insert<__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>>(__gnu_cxx::__normal_iterator<doris::PTabletID*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, std::forward_iterator_tag) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/vector.tcc:832:3
    #6 0x558f6e6c54c5 in __gnu_cxx::__normal_iterator<doris::PTabletID*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>> std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>::insert<__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, void>(__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:1483:4
    #7 0x558f9b4b214f in doris::LoadStreamMap::save_tablets_to_commit(long, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>> const&) /mnt/disk2/xujianxu/doris/be/src/vec/sink/load_stream_map_pool.cpp:90:13
    #8 0x558f9b7258dd in doris::vectorized::VTabletWriterV2::_calc_tablets_to_commit() /mnt/disk2/xujianxu/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:650:27
    #9 0x558f9b7229f1 in doris::vectorized::VTabletWriterV2::close(doris::Status) /mnt/disk2/xujianxu/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:547:9

Multiple sinks with different table loads use the load id, causing confusion in the use of shared data structures between sinks.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented May 27, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

dataroaring
dataroaring previously approved these changes May 27, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels May 27, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 42153 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a71802e2ce7786a0ad1eebb9fa284475f77a1770, data reload: false

------ Round 1 ----------------------------------
q1	17598	4334	4310	4310
q2	2019	195	190	190
q3	10438	1272	1286	1272
q4	10206	839	854	839
q5	7485	2749	2769	2749
q6	221	138	137	137
q7	964	616	644	616
q8	9222	2174	2107	2107
q9	9056	6712	6731	6712
q10	9248	3968	3867	3867
q11	442	250	233	233
q12	441	221	218	218
q13	18461	3169	3239	3169
q14	263	231	226	226
q15	519	475	482	475
q16	529	417	388	388
q17	998	730	753	730
q18	8347	7773	7885	7773
q19	4974	1568	1548	1548
q20	642	325	321	321
q21	5210	4065	3990	3990
q22	363	283	288	283
Total cold run time: 117646 ms
Total hot run time: 42153 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4550	4419	4397	4397
q2	385	263	266	263
q3	3159	2989	2772	2772
q4	1879	1583	1650	1583
q5	5498	5495	5494	5494
q6	216	129	129	129
q7	2192	1818	1841	1818
q8	3250	3386	3440	3386
q9	8729	8674	8636	8636
q10	3949	3848	3875	3848
q11	634	527	524	524
q12	787	634	621	621
q13	15999	3124	3213	3124
q14	299	274	275	274
q15	525	489	501	489
q16	505	434	424	424
q17	1758	1485	1469	1469
q18	7658	7449	7579	7449
q19	1673	1592	1558	1558
q20	1982	1791	1806	1791
q21	5053	4783	4745	4745
q22	574	515	479	479
Total cold run time: 71254 ms
Total hot run time: 55273 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168886 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a71802e2ce7786a0ad1eebb9fa284475f77a1770, data reload: false

query1	915	378	370	370
query2	6464	2557	2391	2391
query3	6639	208	215	208
query4	19720	17323	17352	17323
query5	4156	435	432	432
query6	249	157	158	157
query7	4586	310	291	291
query8	243	186	190	186
query9	8470	2404	2367	2367
query10	453	289	254	254
query11	10521	10170	10072	10072
query12	130	92	88	88
query13	1632	363	361	361
query14	9286	5989	6844	5989
query15	227	165	165	165
query16	7357	258	269	258
query17	1302	523	522	522
query18	1923	275	272	272
query19	202	151	153	151
query20	92	84	85	84
query21	212	134	132	132
query22	4292	4031	3973	3973
query23	33544	32985	33138	32985
query24	7174	2893	2847	2847
query25	574	356	373	356
query26	711	160	156	156
query27	2149	316	316	316
query28	4360	2075	2058	2058
query29	859	602	601	601
query30	247	145	151	145
query31	963	740	747	740
query32	90	53	55	53
query33	501	272	266	266
query34	851	486	475	475
query35	712	607	627	607
query36	1043	904	942	904
query37	107	70	68	68
query38	2957	2801	2808	2801
query39	847	811	815	811
query40	198	129	131	129
query41	54	47	46	46
query42	108	103	100	100
query43	572	576	538	538
query44	1094	734	753	734
query45	180	200	161	161
query46	1049	709	725	709
query47	1877	1774	1796	1774
query48	356	292	296	292
query49	834	375	383	375
query50	763	391	377	377
query51	6744	6709	6684	6684
query52	99	94	89	89
query53	363	283	292	283
query54	540	429	418	418
query55	81	74	76	74
query56	256	240	252	240
query57	1155	1017	1042	1017
query58	237	243	213	213
query59	3366	3332	3441	3332
query60	268	263	265	263
query61	92	88	91	88
query62	606	451	445	445
query63	311	284	282	282
query64	8424	2276	1743	1743
query65	3167	3104	3145	3104
query66	836	331	320	320
query67	15323	15130	14937	14937
query68	4518	534	541	534
query69	436	267	266	266
query70	1114	1072	1076	1072
query71	368	263	273	263
query72	7337	5597	2712	2712
query73	721	318	320	318
query74	6040	5659	5629	5629
query75	3300	2677	2631	2631
query76	2244	937	996	937
query77	378	267	271	267
query78	10237	9729	9730	9729
query79	2458	515	516	515
query80	1061	440	431	431
query81	513	224	222	222
query82	1282	90	93	90
query83	244	168	170	168
query84	244	83	87	83
query85	1471	281	268	268
query86	467	304	282	282
query87	3280	3085	3131	3085
query88	3973	2345	2335	2335
query89	462	385	392	385
query90	2031	188	185	185
query91	131	110	108	108
query92	66	54	53	53
query93	2664	520	500	500
query94	1304	248	182	182
query95	403	308	301	301
query96	600	268	263	263
query97	3208	2999	3020	2999
query98	239	223	215	215
query99	1123	856	862	856
Total cold run time: 259630 ms
Total hot run time: 168886 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.76% (9009/25191)
Line Coverage: 27.37% (74567/272457)
Region Coverage: 26.59% (38577/145099)
Branch Coverage: 23.45% (19669/83882)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a71802e2ce7786a0ad1eebb9fa284475f77a1770_a71802e2ce7786a0ad1eebb9fa284475f77a1770/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.6 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a71802e2ce7786a0ad1eebb9fa284475f77a1770, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.05
query3	0.22	0.06	0.05
query4	1.67	0.10	0.09
query5	0.51	0.50	0.50
query6	1.13	0.74	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.53	0.49	0.49
query10	0.54	0.56	0.54
query11	0.16	0.11	0.12
query12	0.15	0.12	0.12
query13	0.59	0.60	0.59
query14	0.75	0.77	0.79
query15	0.83	0.82	0.81
query16	0.36	0.39	0.36
query17	1.01	0.95	0.96
query18	0.24	0.23	0.27
query19	1.80	1.81	1.84
query20	0.02	0.01	0.01
query21	15.46	0.70	0.69
query22	4.70	7.36	1.77
query23	18.31	1.32	1.28
query24	1.63	0.26	0.22
query25	0.15	0.08	0.08
query26	0.25	0.17	0.17
query27	0.08	0.08	0.09
query28	13.41	1.03	1.00
query29	12.66	3.26	3.26
query30	0.23	0.06	0.05
query31	2.88	0.38	0.38
query32	3.28	0.46	0.47
query33	2.84	2.94	2.86
query34	16.95	4.48	4.43
query35	4.55	4.51	4.51
query36	0.69	0.50	0.48
query37	0.18	0.15	0.16
query38	0.15	0.15	0.16
query39	0.04	0.03	0.03
query40	0.16	0.15	0.18
query41	0.08	0.04	0.04
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.48 s
Total hot run time: 30.6 s

liaoxin01
liaoxin01 previously approved these changes May 27, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@sollhui sollhui dismissed stale reviews from dataroaring and liaoxin01 via 459c300 May 30, 2024 09:27
@sollhui
Copy link
Contributor Author

sollhui commented May 30, 2024

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 30, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@sollhui
Copy link
Contributor Author

sollhui commented May 30, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@sollhui
Copy link
Contributor Author

sollhui commented May 30, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40034 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 34b2cefa4799a74fff6982efb9f04b3c8431ef2d, data reload: false

------ Round 1 ----------------------------------
q1	18050	4454	4361	4361
q2	2689	190	196	190
q3	11058	1157	1147	1147
q4	10263	814	836	814
q5	7652	2695	2697	2695
q6	220	133	136	133
q7	966	609	596	596
q8	9472	2106	2060	2060
q9	8806	6476	6482	6476
q10	8982	3716	3706	3706
q11	440	254	231	231
q12	419	232	212	212
q13	17767	2948	2985	2948
q14	267	212	215	212
q15	500	461	466	461
q16	508	377	374	374
q17	957	625	657	625
q18	8049	7441	7461	7441
q19	3493	1585	1552	1552
q20	651	311	296	296
q21	4931	3172	3923	3172
q22	384	336	332	332
Total cold run time: 116524 ms
Total hot run time: 40034 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4330	4195	4267	4195
q2	365	270	275	270
q3	2980	2760	2751	2751
q4	1867	1578	1567	1567
q5	5236	5265	5266	5265
q6	215	124	123	123
q7	2111	1669	1715	1669
q8	3193	3294	3296	3294
q9	8307	8311	8300	8300
q10	3886	3666	3645	3645
q11	585	505	485	485
q12	739	589	573	573
q13	16288	2997	3027	2997
q14	283	276	261	261
q15	505	466	467	466
q16	485	418	406	406
q17	1785	1474	1476	1474
q18	7641	7474	7257	7257
q19	1668	1571	1600	1571
q20	1998	1775	1789	1775
q21	4844	4718	4769	4718
q22	583	539	533	533
Total cold run time: 69894 ms
Total hot run time: 53595 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169100 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 34b2cefa4799a74fff6982efb9f04b3c8431ef2d, data reload: false

query1	913	386	370	370
query2	6449	2467	2277	2277
query3	6653	214	219	214
query4	19678	17181	17381	17181
query5	4203	440	424	424
query6	249	163	148	148
query7	4578	291	287	287
query8	324	286	291	286
query9	8460	2418	2390	2390
query10	452	272	268	268
query11	10647	10042	10124	10042
query12	148	84	83	83
query13	1639	370	389	370
query14	8489	7387	7098	7098
query15	227	189	185	185
query16	7862	257	260	257
query17	1855	520	505	505
query18	1979	269	263	263
query19	191	153	153	153
query20	93	85	80	80
query21	201	141	128	128
query22	4195	3966	3863	3863
query23	33753	33035	33139	33035
query24	10758	2788	2825	2788
query25	624	356	354	354
query26	1471	149	154	149
query27	2998	313	324	313
query28	7440	2088	2115	2088
query29	960	603	620	603
query30	284	150	151	150
query31	958	751	729	729
query32	93	51	53	51
query33	764	266	259	259
query34	1040	474	492	474
query35	740	619	605	605
query36	1054	893	900	893
query37	134	66	65	65
query38	2917	2791	2750	2750
query39	827	780	780	780
query40	265	125	124	124
query41	52	50	53	50
query42	104	98	93	93
query43	587	589	552	552
query44	1206	734	744	734
query45	184	160	167	160
query46	1057	712	720	712
query47	1868	1793	1799	1793
query48	371	300	299	299
query49	1126	386	375	375
query50	767	397	378	378
query51	6821	6610	6728	6610
query52	100	95	91	91
query53	391	286	280	280
query54	931	439	431	431
query55	73	73	71	71
query56	261	234	246	234
query57	1124	1028	1039	1028
query58	229	208	207	207
query59	3371	3209	3073	3073
query60	291	251	258	251
query61	92	88	87	87
query62	667	471	447	447
query63	313	286	284	284
query64	9911	2198	1744	1744
query65	3190	3096	3120	3096
query66	1412	333	326	326
query67	15561	14795	14889	14795
query68	4777	539	538	538
query69	479	265	270	265
query70	1069	1102	1078	1078
query71	426	269	260	260
query72	8235	2707	2529	2529
query73	728	328	324	324
query74	6143	5718	5641	5641
query75	3591	2622	2631	2622
query76	3402	982	960	960
query77	627	262	265	262
query78	10725	9824	9805	9805
query79	2437	521	508	508
query80	1318	447	425	425
query81	497	227	222	222
query82	686	93	88	88
query83	196	170	174	170
query84	265	87	89	87
query85	1449	272	264	264
query86	459	323	320	320
query87	3259	3095	3119	3095
query88	4006	2338	2337	2337
query89	474	385	380	380
query90	1960	199	190	190
query91	132	108	168	108
query92	63	51	47	47
query93	1745	524	496	496
query94	1152	185	181	181
query95	397	304	308	304
query96	581	264	260	260
query97	3246	3027	3013	3013
query98	252	216	210	210
query99	1196	828	873	828
Total cold run time: 274597 ms
Total hot run time: 169100 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 34b2cefa4799a74fff6982efb9f04b3c8431ef2d, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.08	0.07
query5	0.50	0.47	0.49
query6	1.12	0.71	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.50	0.48
query10	0.55	0.57	0.53
query11	0.16	0.11	0.11
query12	0.14	0.12	0.12
query13	0.59	0.59	0.59
query14	0.79	0.77	0.78
query15	0.84	0.81	0.81
query16	0.36	0.38	0.37
query17	0.96	1.02	1.02
query18	0.23	0.23	0.25
query19	1.89	1.75	1.81
query20	0.02	0.01	0.02
query21	15.74	0.68	0.66
query22	4.23	7.00	2.00
query23	18.26	1.35	1.22
query24	2.10	0.21	0.21
query25	0.15	0.08	0.08
query26	0.26	0.17	0.18
query27	0.08	0.08	0.07
query28	13.29	1.00	0.99
query29	12.87	3.35	3.26
query30	0.24	0.05	0.06
query31	2.87	0.38	0.38
query32	3.29	0.46	0.45
query33	2.86	2.96	2.93
query34	17.34	4.39	4.43
query35	4.49	4.50	4.47
query36	0.65	0.46	0.47
query37	0.17	0.17	0.15
query38	0.15	0.15	0.15
query39	0.04	0.03	0.04
query40	0.17	0.14	0.14
query41	0.09	0.05	0.04
query42	0.05	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 110.22 s
Total hot run time: 30.58 s

@sollhui
Copy link
Contributor Author

sollhui commented May 31, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 42156 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4d48bef72f383ebceee578b5c09b365391fdf733, data reload: false

------ Round 1 ----------------------------------
q1	17608	4480	4290	4290
q2	2028	193	196	193
q3	10468	1189	1134	1134
q4	10196	900	807	807
q5	7505	2707	2785	2707
q6	216	135	138	135
q7	969	676	609	609
q8	9218	2148	2105	2105
q9	9168	6752	6738	6738
q10	9465	3907	3867	3867
q11	429	237	246	237
q12	439	226	231	226
q13	17294	3177	3276	3177
q14	267	224	220	220
q15	517	476	484	476
q16	503	388	400	388
q17	1001	735	761	735
q18	8378	7778	7813	7778
q19	4292	1629	1621	1621
q20	665	322	313	313
q21	5159	4084	4107	4084
q22	370	316	340	316
Total cold run time: 116155 ms
Total hot run time: 42156 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4532	4410	4346	4346
q2	385	261	274	261
q3	3119	2970	2820	2820
q4	1886	1539	1659	1539
q5	5513	5497	5501	5497
q6	217	120	127	120
q7	2197	1840	1789	1789
q8	3281	3411	3426	3411
q9	8705	8735	8699	8699
q10	3902	3761	3814	3761
q11	589	494	485	485
q12	793	629	662	629
q13	15773	3121	3196	3121
q14	296	269	279	269
q15	529	475	485	475
q16	490	431	448	431
q17	1827	1498	1505	1498
q18	7722	7730	7622	7622
q19	3326	1570	1576	1570
q20	1985	1808	1788	1788
q21	4860	4686	4747	4686
q22	628	532	540	532
Total cold run time: 72555 ms
Total hot run time: 55349 ms

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 3, 2024
Copy link
Contributor

github-actions bot commented Jun 3, 2024

PR approved by at least one committer and no changes requested.

@dataroaring
Copy link
Contributor

run buildall

Copy link
Contributor

github-actions bot commented Jun 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.38% (8975/24671)
Line Coverage: 27.89% (73277/262719)
Region Coverage: 27.33% (37943/138843)
Branch Coverage: 23.93% (19265/80504)
Coverage Report: http://coverage.selectdb-in.cc/coverage/bc367864a58fb78bcad2fd6e11ba27e03aae40a1_bc367864a58fb78bcad2fd6e11ba27e03aae40a1/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bc367864a58fb78bcad2fd6e11ba27e03aae40a1, data reload: false

------ Round 1 ----------------------------------
q1	17614	4379	4263	4263
q2	2027	194	189	189
q3	10463	1190	1127	1127
q4	10183	860	835	835
q5	7472	2711	2700	2700
q6	229	134	136	134
q7	971	621	613	613
q8	9214	2149	2128	2128
q9	9776	6697	6751	6697
q10	9195	3898	3885	3885
q11	430	242	237	237
q12	470	227	239	227
q13	17530	3209	3166	3166
q14	258	219	225	219
q15	505	472	481	472
q16	482	396	403	396
q17	973	723	571	571
q18	8383	7923	7841	7841
q19	4052	1360	1467	1360
q20	641	329	326	326
q21	5121	3950	3965	3950
q22	415	337	348	337
Total cold run time: 116404 ms
Total hot run time: 41673 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4603	4442	4893	4442
q2	383	263	282	263
q3	3174	2930	2812	2812
q4	1895	1690	1656	1656
q5	5495	5505	5491	5491
q6	223	127	125	125
q7	2204	1841	1845	1841
q8	3225	3369	3367	3367
q9	8629	8657	8735	8657
q10	4087	3766	3711	3711
q11	573	469	474	469
q12	809	643	610	610
q13	17027	3125	3147	3125
q14	302	268	270	268
q15	519	474	469	469
q16	495	487	426	426
q17	1838	1504	1498	1498
q18	8036	7434	7408	7408
q19	1703	1567	1668	1567
q20	2107	1797	1798	1797
q21	4880	4767	4625	4625
q22	656	529	545	529
Total cold run time: 72863 ms
Total hot run time: 55156 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173022 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bc367864a58fb78bcad2fd6e11ba27e03aae40a1, data reload: false

query1	946	406	395	395
query2	6468	2456	2274	2274
query3	6646	209	212	209
query4	20061	17545	17475	17475
query5	4086	448	464	448
query6	253	162	154	154
query7	4584	303	297	297
query8	325	285	277	277
query9	8462	2363	2366	2363
query10	444	305	279	279
query11	10592	10101	9919	9919
query12	140	93	86	86
query13	1661	367	360	360
query14	9477	7735	7049	7049
query15	233	192	183	183
query16	7506	260	265	260
query17	1350	521	523	521
query18	1935	283	287	283
query19	201	149	148	148
query20	89	89	82	82
query21	217	145	134	134
query22	4381	4055	4017	4017
query23	33924	33307	32950	32950
query24	11397	2797	2850	2797
query25	610	355	367	355
query26	1171	156	154	154
query27	2478	319	338	319
query28	6706	2056	2048	2048
query29	899	629	619	619
query30	288	156	169	156
query31	989	730	749	730
query32	89	55	53	53
query33	781	317	277	277
query34	957	483	481	481
query35	774	612	607	607
query36	1107	937	918	918
query37	149	70	71	70
query38	2877	2760	2728	2728
query39	879	804	786	786
query40	211	126	127	126
query41	53	53	51	51
query42	119	98	99	98
query43	587	583	556	556
query44	1218	729	751	729
query45	203	161	169	161
query46	1082	742	718	718
query47	1862	1793	1818	1793
query48	387	327	307	307
query49	1089	409	413	409
query50	792	385	380	380
query51	6902	6770	6768	6768
query52	101	95	95	95
query53	359	283	301	283
query54	962	445	434	434
query55	75	72	73	72
query56	282	250	253	250
query57	1146	1034	1027	1027
query58	259	255	247	247
query59	3476	3186	3202	3186
query60	287	266	261	261
query61	92	86	89	86
query62	664	454	471	454
query63	337	287	291	287
query64	9028	2194	1710	1710
query65	3256	3134	3152	3134
query66	941	317	339	317
query67	15299	14855	14790	14790
query68	4552	547	530	530
query69	460	296	298	296
query70	1173	1069	1028	1028
query71	399	286	290	286
query72	7165	5329	5637	5329
query73	766	334	327	327
query74	5905	5588	5473	5473
query75	3406	2693	2674	2674
query76	2174	954	940	940
query77	446	305	301	301
query78	10758	10108	9930	9930
query79	2701	531	510	510
query80	2201	502	493	493
query81	583	238	230	230
query82	799	106	105	105
query83	318	180	181	180
query84	267	90	98	90
query85	1624	337	340	337
query86	457	316	314	314
query87	3304	3117	3132	3117
query88	3887	2449	2457	2449
query89	478	393	370	370
query90	1748	193	190	190
query91	141	109	110	109
query92	71	51	50	50
query93	2464	517	495	495
query94	1137	194	194	194
query95	412	316	320	316
query96	595	276	275	275
query97	3200	3027	3041	3027
query98	252	223	219	219
query99	1372	848	852	848
Total cold run time: 272416 ms
Total hot run time: 173022 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bc367864a58fb78bcad2fd6e11ba27e03aae40a1, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.04
query3	0.23	0.06	0.05
query4	1.67	0.11	0.09
query5	0.49	0.47	0.49
query6	1.13	0.73	0.72
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.55	0.51	0.50
query10	0.55	0.56	0.54
query11	0.16	0.10	0.10
query12	0.13	0.12	0.10
query13	0.59	0.59	0.60
query14	0.78	0.78	0.79
query15	0.82	0.80	0.82
query16	0.38	0.37	0.36
query17	0.94	0.94	0.99
query18	0.19	0.28	0.24
query19	1.81	1.70	1.74
query20	0.02	0.01	0.00
query21	15.72	0.65	0.64
query22	4.55	6.67	2.19
query23	18.30	1.37	1.25
query24	1.74	0.25	0.25
query25	0.15	0.09	0.08
query26	0.27	0.18	0.17
query27	0.08	0.08	0.07
query28	13.36	1.01	1.00
query29	13.33	3.37	3.32
query30	0.25	0.06	0.05
query31	2.88	0.38	0.38
query32	3.27	0.46	0.46
query33	2.85	2.98	2.86
query34	17.22	4.41	4.49
query35	4.50	4.54	4.61
query36	0.68	0.47	0.45
query37	0.18	0.15	0.16
query38	0.16	0.14	0.14
query39	0.05	0.04	0.03
query40	0.16	0.13	0.14
query41	0.09	0.05	0.04
query42	0.04	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 110.49 s
Total hot run time: 30.77 s

@dataroaring dataroaring merged commit 327cbf8 into apache:master Jun 5, 2024
25 of 27 checks passed
dataroaring pushed a commit that referenced this pull request Jun 7, 2024
…#35458)

## Proposed changes

Move memtable core when use multi table load:
```
0x51f000c73860 is located 3040 bytes inside of 3456-byte region [0x51f000c72c80,0x51f000c73a00)
freed by thread T4867 (FragmentMgrThre) here:
    #0 0x558f6ad7f43d in operator delete(void*) (/mnt/hdd01/STRESS_ENV/be/lib/doris_be+0x22eec43d) (BuildId: b46f73d1f76dfcd6)
    #1 0x558f6e6cea2c in std::__new_allocator<doris::PTabletID>::deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/new_allocator.h:168:2
    #2 0x558f6e6ce9e7 in std::allocator<doris::PTabletID>::deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/allocator.h:210:25
    #3 0x558f6e6ce9e7 in std::allocator_traits<std::allocator<doris::PTabletID>>::deallocate(std::allocator<doris::PTabletID>&, doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/alloc_traits.h:516:13
    #4 0x558f6e6ce9e7 in std::_Vector_base<doris::PTabletID, std::allocator<doris::PTabletID>>::_M_deallocate(doris::PTabletID*, unsigned long) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:387:4
    #5 0x558f6e6d0780 in void std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>::_M_range_insert<__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>>(__gnu_cxx::__normal_iterator<doris::PTabletID*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, std::forward_iterator_tag) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/vector.tcc:832:3
    #6 0x558f6e6c54c5 in __gnu_cxx::__normal_iterator<doris::PTabletID*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>> std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>::insert<__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, void>(__gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>, __gnu_cxx::__normal_iterator<doris::PTabletID const*, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>>>) /mnt/disk2/xujianxu/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_vector.h:1483:4
    #7 0x558f9b4b214f in doris::LoadStreamMap::save_tablets_to_commit(long, std::vector<doris::PTabletID, std::allocator<doris::PTabletID>> const&) /mnt/disk2/xujianxu/doris/be/src/vec/sink/load_stream_map_pool.cpp:90:13
    #8 0x558f9b7258dd in doris::vectorized::VTabletWriterV2::_calc_tablets_to_commit() /mnt/disk2/xujianxu/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:650:27
    #9 0x558f9b7229f1 in doris::vectorized::VTabletWriterV2::close(doris::Status) /mnt/disk2/xujianxu/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:547:9
```

Multiple sinks with different table loads use the load id, causing
confusion in the use of shared data structures between sinks.
dataroaring pushed a commit that referenced this pull request Jul 7, 2024
dataroaring pushed a commit that referenced this pull request Jul 25, 2024
We meet OOM when using single stream multi table

![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:

![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by #35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory

![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
dataroaring pushed a commit that referenced this pull request Jul 30, 2024
We meet OOM when using single stream multi table

![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:

![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by #35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory

![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
dataroaring pushed a commit that referenced this pull request Aug 4, 2024
…#38824)

pick (#38255)

We meet OOM when using single stream multi table


![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:


![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by #35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory


![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants