[NSE-857] Fill destination buffer by reducer #880

FelixYBW · 2022-04-30T08:54:35Z

What changes were proposed in this pull request?

The solution is to create an offset array which list the src offset for each reducer, like below:
reducer0:

reducer#	Vector index	Vector value
0	0	0
0	1	4
0	2	6
1	3	1
1	4	5
1	5	8

Then we can read the src column randomly and fill the destination column sequentially one reducer by one reducer. The source data size should be smaller enough to hold into L1/L2 cache, and make sure source and destination cache line both are read onece. Otherwise the performance will be very bad. Currently the recordbatch size is 32K rows, so for double column the size is 128K.
On the write, we can use NTStore to bypass RFO. Then we can avoid the cache polution. But when reducer# is very large, ntstore doesn't works well because each reducer will be only fill little data, like 32K batch for 4000 reducer, each reducer will be written 8 values only.

How was this patch tested?

From benchmark data, the solution partially solved the reducer# scaling issue. From below chart, we can see 4096 and 512 reducer has the same performance

Remining work

AVX implementation doesn't show better performance than INT solution. NTStore either doesn't show better performance

…stream Implemented fill by reducer

github-actions · 2022-04-30T08:54:51Z

#857

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

FelixYBW · 2022-05-01T09:17:32Z

Jenkins TPCH done. 8x partition doesn't show any regression now.
tpch_2022_05_01_application_1647347981137_0322.html

Note it doesn't solve the reducer# scaling issue completely. With even larger reducer#, each reducer only gets a few rows which doesn't fill up 64Byte cache line. So the memory throughput vs. split data size ratio will increase again and performance will drop soon. The another solution is to convert into row based format during split. At recucer side, return to columnar format.

FelixYBW · 2022-05-01T09:18:48Z

To record, Original performance:
tpch_2021_12_01_application_1638156867030_0038.html

This reverts commit f8e51f4.

binwei added 2 commits April 30, 2022 16:32

merge master and branch shuffle_opt_fillbyreducer. To submit PR to up…

3a59440

…stream Implemented fill by reducer

format code

94b733d

fix format

1b31583

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

FelixYBW merged commit f8e51f4 into oap-project:master May 1, 2022

FelixYBW deleted the shuffle_fillbyreducer branch May 1, 2022 09:17

zhouyuan added a commit to zhouyuan/native-sql-engine that referenced this pull request May 9, 2022

Revert "[NSE-857] Fill destination buffer by reducer (oap-project#880)"

ce68376

This reverts commit f8e51f4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NSE-857] Fill destination buffer by reducer #880

[NSE-857] Fill destination buffer by reducer #880

FelixYBW commented Apr 30, 2022

github-actions bot commented Apr 30, 2022

FelixYBW commented May 1, 2022

FelixYBW commented May 1, 2022

[NSE-857] Fill destination buffer by reducer #880

[NSE-857] Fill destination buffer by reducer #880

Conversation

FelixYBW commented Apr 30, 2022

What changes were proposed in this pull request?

How was this patch tested?

Remining work

github-actions bot commented Apr 30, 2022

FelixYBW commented May 1, 2022

FelixYBW commented May 1, 2022