-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](bug) fix the divide zero in local shuffle #37906
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 40032 ms
|
TPC-DS: Total hot run time: 172655 ms
|
ClickBench: Total hot run time: 30.69 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
## Proposed changes cherry pick #37906 <!--Describe your changes.-->
if 'num_buckets == 0' means the fragment is colocated by exchange node not the scan node. so here use `_num_instance` to replace the `num_buckets` to prevent dividing 0 still keep colocate plan after local shuffle `coredump`: ``` SIGFPE integer divide by zero (@0x56431791a54a) received by PID 33673 (TID 37768 OR 0x7f8028018640) from PID 395421002; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F8C47895520 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::Partitioner::do_partitioning(doris::RuntimeState*, doris::vectorized::Block*, doris::MemTracker*) const at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/runtime/partitioner.cpp:50 5# doris::pipeline::ShuffleExchanger::sink(doris::RuntimeState*, doris::vectorized::Block*, bool, doris::pipeline::LocalExchangeSinkLocalState&) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/local_exchange/local_exchanger.cpp:33 6# doris::pipeline::LocalExchangeSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) in /mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be 7# doris::pipeline::PipelineTask::execute(bool*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/pipeline_task.cpp:359 8# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/task_scheduler.cpp:138 9# doris::ThreadPool::dispatch_thread() in /mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be 10# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499 11# start_thread at ./nptl/pthread_create.c:442 12# 0x00007F8C47979850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83 ```
if 'num_buckets == 0' means the fragment is colocated by exchange node not the scan node. so here use `_num_instance` to replace the `num_buckets` to prevent dividing 0 still keep colocate plan after local shuffle `coredump`: ``` SIGFPE integer divide by zero (@0x56431791a54a) received by PID 33673 (TID 37768 OR 0x7f8028018640) from PID 395421002; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F8C47895520 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::Partitioner::do_partitioning(doris::RuntimeState*, doris::vectorized::Block*, doris::MemTracker*) const at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/runtime/partitioner.cpp:50 5# doris::pipeline::ShuffleExchanger::sink(doris::RuntimeState*, doris::vectorized::Block*, bool, doris::pipeline::LocalExchangeSinkLocalState&) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/local_exchange/local_exchanger.cpp:33 6# doris::pipeline::LocalExchangeSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) in /mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be 7# doris::pipeline::PipelineTask::execute(bool*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/pipeline_task.cpp:359 8# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/task_scheduler.cpp:138 9# doris::ThreadPool::dispatch_thread() in /mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be 10# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499 11# start_thread at ./nptl/pthread_create.c:442 12# 0x00007F8C47979850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83 ```
Proposed changes
if 'num_buckets == 0' means the fragment is colocated by exchange node not the
scan node. so here use
_num_instance
to replace thenum_buckets
to prevent dividing 0still keep colocate plan after local shuffle
coredump
: